How to efficiently perform “distinct” with multiple keys?

If you are willing to wait for the upcoming 2.2 release of MongoDB, you can run this query efficiently using the aggregation framework:

collection = db.tb;
result = collection.aggregate( 
            [
                {"$group": { "_id": { market: "$market", code: "$code" } } }
            ]
        );
printjson(result);

On a million-record collection on my test machine, this ran in 4 seconds, while the map/reduce version took over a minute.

Leave a Comment