mongodb aggregate query isn’t returning proper sum on using $sum

Your current schema has the marks field data type as string and you need an integer data type for your aggregation framework to work out the sum. On the other hand, you can use MapReduce to calculate the sum since it allows the use of native JavaScript methods like parseInt() on your object properties in its map functions. So overall you have two choices.


Option 1: Update Schema (Change Data Type)

The first would be to change the schema or add another field in your document that has the actual numerical value not the string representation. If your collection document size is relatively small, you could use a combination of the mongodb’s cursor find(), forEach() and update() methods to change your marks schema:

db.student.find({ "marks": { "$type": 2 } }).snapshot().forEach(function(doc) {
    db.student.update(
        { "_id": doc._id, "marks": { "$type": 2 } }, 
        { "$set": { "marks": parseInt(doc.marks) } }
    );
});

For relatively large collection sizes, your db performance will be slow and it’s recommended to use mongo bulk updates for this:

MongoDB versions >= 2.6 and < 3.2:

var bulk = db.student.initializeUnorderedBulkOp(),
    counter = 0;

db.student.find({"marks": {"$exists": true, "$type": 2 }}).forEach(function (doc) {    
    bulk.find({ "_id": doc._id }).updateOne({ 
        "$set": { "marks": parseInt(doc.marks) } 
    });

    counter++;
    if (counter % 1000 === 0) {
        // Execute per 1000 operations 
        bulk.execute(); 

        // re-initialize every 1000 update statements
        bulk = db.student.initializeUnorderedBulkOp();
    }
})

// Clean up remaining operations in queue
if (counter % 1000 !== 0) bulk.execute(); 

MongoDB version 3.2 and newer:

var ops = [],
    cursor = db.student.find({"marks": {"$exists": true, "$type": 2 }});

cursor.forEach(function (doc) {     
    ops.push({ 
        "updateOne": { 
            "filter": { "_id": doc._id } ,              
            "update": { "$set": { "marks": parseInt(doc.marks) } } 
        }         
    });

    if (ops.length === 1000) {
        db.student.bulkWrite(ops);
        ops = [];
    }     
});

if (ops.length > 0) db.student.bulkWrite(ops);

Option 2: Run MapReduce

The second approach would be to rewrite your query with MapReduce where you can use the JavaScript function parseInt().

In your MapReduce operation, define the map function that process each input document. This function maps the converted marks string value to the subject for each document, and emits the subject and converted marks pair. This is where the JavaScript native function parseInt() can be applied. Note: in the function, this refers to the document that the map-reduce operation is processing:

var mapper = function () {
    var x = parseInt(this.marks);
    emit(this.subject, x);
};

Next, define the corresponding reduce function with two arguments keySubject and valuesMarks. valuesMarks is an array whose elements are the integer marks values emitted by the map function and grouped by keySubject.
The function reduces the valuesMarks array to the sum of its elements.

var reducer = function(keySubject, valuesMarks) {
    return Array.sum(valuesMarks);
};

db.student.mapReduce(
    mapper,
    reducer,
    {
        out : "example_results",
        query: { subject : "maths" }       
    }
 );

With your collection, the above will put your MapReduce aggregation result in a new collection db.example_results. Thus, db.example_results.find() will output:

/* 0 */
{
    "_id" : "maths",
    "value" : 163
}

Leave a Comment