How does sorting with an index work in MongoDB?

Indexes in MongoDB are stored in a B-tree structure, where each index entry points to a specific location on-disk. Using a B-tree structure also means that a MongoDB index is stored in a sorted order, always traversed in-order, and is cheap for MongoDB to fetch a series of documents in a sorted order via indexes.

Update: The B-tree structure is true for the MMAPv1 storage engine, but is implemented slightly differently by the WiredTiger storage engine (default since MongoDB 3.2). The basic idea remains the same, where it’s cheap to traverse the index in a sorted order.

A SORT stage (i.e. in-memory sort) in a query is limited to 32MB of memory use. A query will fail if the SORT stage exceeds this limit. This limit can be sidestepped by utilizing the sorted nature of indexes, so that MongoDB can return a query with a sort() parameter without performing an in-memory sort.

Let us assume that the query is of the shape:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort(...)

with collection a having an index of:

    db.a.createIndex({b:1,c:1})

There are two possible scenarios when a sort() stage is specified in the query:

1. MongoDB cannot use the sorted nature of the index and must perform an in-memory SORT stage.

This is the outcome if the query cannot use the “index prefix”. For example:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({c:1})

In the query above, the index {b:1,c:1} can be used to:

Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
However, there is no guarantee that the returned documents are sorted in terms of c.

Therefore, MongoDB has no choice but to perform an in-memory sort. The explain() output of this query will have a SORT stage. This SORT stage would be limited to 32MB of memory use.

2. MongoDB can use the sorted nature of the index.

This is the outcome if the query uses:

Sort keys that matches the order of the index, and
Specifies the same ordering as the index (i.e. the index {b:1,c:1} can be used for sort({b:1,c:1}) or sort({b:-1,c:-1}) but not sort({b:1,c:-1}))

For example:

    db.a.find({b:{$gt:100}, c:{$gt:200}}).sort({b:1})

In the query above, the index {b:1,c:1} can be used to:

Match documents having b greater than 100 for the {b:{$gt:100}} portion of the query.
In this case, MongoDB can guarantee that the returned documents are sorted in terms of b.

The explain() output of the query above will not have a SORT stage. Also, the explain() output of the query with and without sort() are identical. In essence, we are getting the sort() for free.

A worthwhile resource to understand this subject is Optimizing MongoDB Compound Indexes. Please note that this blog post was written way back in 2012. Although some of the terminology may be outdated, the technicality of the post is still relevant.

Update on follow-up questions

MongoDB uses only one index for most queries. So for example, to avoid an in-memory SORT stage in the query
```
db.a.find({a:1}).sort({b:1})
```
the index must cover both a and b fields at the same time; e.g. a compound index such as {a:1,b:1} is required. You cannot have two separate indexes {a:1} and {b:1}, and expect the {a:1} index to be used for the equality part, and the {b:1} index to be used for the sort part. In this case, MongoDB will choose one of the two indexes.

Therefore, it is correct that the results are sorted because they are looked up and returned in the order of the index.
To avoid having an in-memory sort using a compound index, the first part of the index must cater to the equality part of the query, and the second part must cater to the sort part of the query (as shown in the explanation for (1) above).

If you have a query like this:
```
db.a.find({}).sort({a:1})
```
the index {a:1,b:1} can be used for the sort part (since you’re basically returning the whole collection). And if your query looks like this:
```
db.a.find({a:1}).sort({b:1})
```
the same index {a:1,b:1} can also be used for both parts of the query. Also:
```
db.a.find({a:1,b:1})
```
can also use the same index {a:1,b:1}

Notice the pattern here: the find() followed by sort() parameters follow the index order {a:1,b:1}. Therefore a compound index must be ordered by equality -> sort.

Update regarding sorting of different types

If a field has different types between documents (e.g. if a is string in one document, number in others, boolean in yet another), how do the sort proceed?

The answer is MongoDB BSON type comparison order. To paraphrase the manual page, the order is:

MinKey (internal type)
Null
Numbers (ints, longs, doubles, decimals)
Symbol, String
Object
Array
BinData
ObjectId
Boolean
Date
Timestamp
Regular Expression
MaxKey (internal type)

So from the example above using ascending order, documents containing numbers will appear first, then strings, then boolean.

More Related Contents:

Leave a Comment Cancel reply