Mongodb Explain for Aggregation framework

Starting with MongoDB version 3.0, simply changing the order from

collection.aggregate(...).explain()

to

collection.explain().aggregate(...)

will give you the desired results (documentation here).

For older versions >= 2.6, you will need to use the explain option for aggregation pipeline operations

explain:true

db.collection.aggregate([
    { $project : { "Tags._id" : 1 }},
    { $unwind : "$Tags" },
    { $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
    { $group: { 
        _id : "$_id",
        count: { $sum:1 } 
    }},
    {$sort: {"count":-1}}
  ],
  {
    explain:true
  }
)

An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match, $sort, $geonear at the beginning of a pipeline) as well as subsequent $lookup and $graphLookup stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project, $unwind, and $group) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse option is set).

Optimizing pipelines

In general, you can optimize aggregation pipelines by:

  • Starting a pipeline with a $match stage to restrict processing to relevant documents.
  • Ensuring the initial $match / $sort stages are supported by an efficient index.
  • Filtering data early using $match, $limit , and $skip .
  • Minimizing unnecessary stages and document manipulation (perhaps reconsidering your schema if complicated aggregation gymnastics are required).
  • Taking advantage of newer aggregation operators if you have upgraded your MongoDB server. For example, MongoDB 3.4 added many new aggregation stages and expressions including support for working with arrays, strings, and facets.

There are also a number of Aggregation Pipeline Optimizations that automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.

Limitations

As at MongoDB 3.4, the Aggregation Framework explain option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats mode for a find() query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain() query with executionStats or allPlansExecution verbosity.

There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:

Leave a Comment