understanding the concept of javascript callbacks with node.js, especially in loops

If the callback is defined in the same scope the loop is defined in (which is frequently the case), then the callback will have access to the index variable. Leaving aside NodeJS particulars for a moment, let’s consider this function:

function doSomething(callback) {
    callback();
}

That function accepts a callback function reference and all it does is call it. Not very exciting. 🙂

Now let’s use that in a loop:

var index;

for (index = 0; index < 3; ++index) {
    doSomething(function() {
        console.log("index = " + index);
    });
}

(In compute-intensive code — like a server process — best not to literally do the above in production code, we’ll come back to that in a moment.)

Now, when we run that, we see the expected output:

index = 0
index = 1
index = 2

Our callback was able to access index, because the callback is a closure over the data in scope where it’s defined. (Don’t worry about the term “closure,” closures are not complicated.)

The reason I said it’s probably best not to literally do the above in compute-intensive production code is that the code creates a function on every iteration (barring fancy optimization in the compiler, and V8 is very clever, but optimizing out creating those functions is non-trivial). So here’s a slightly reworked example:

var index;

for (index = 0; index < 3; ++index) {
    doSomething(doSomethingCallback);
}

function doSomethingCallback() {
    console.log("index = " + index);
}

This may look a bit surprising, but it still works the same way, and still has the same output, because doSomethingCallback is still a closure over index, so it still sees the value of index as of when it’s called. But now there’s only one doSomethingCallback function, rather than a fresh one on every loop.

Now let’s take a negative example, something that doesn’t work:

foo();

function foo() {
    var index;

    for (index = 0; index < 3; ++index) {
        doSomething(myCallback);
    }
}

function myCallback() {
    console.log("index = " + index); // <== Error
}

That fails, because myCallback is not defined in the same scope (or a nested scope) that index is in defined in, and so index is undefined within myCallback.

Finally, let’s consider setting up event handlers in a loop, because one has to be careful with that. Here we will dive into NodeJS a bit:

var spawn = require('child_process').spawn;

var commands = [
    {cmd: 'ls', args: ['-lh', '/etc' ]},
    {cmd: 'ls', args: ['-lh', '/usr' ]},
    {cmd: 'ls', args: ['-lh', '/home']}
];
var index, command, child;

for (index = 0; index < commands.length; ++index) {
    command = commands[index];
    child = spawn(command.cmd, command.args);
    child.on('exit', function() {
        console.log("Process index " + index + " exited"); // <== WRONG
    });
}

It seems like the above should work the same way that our earlier loops did, but there’s a crucial difference. In our earlier loops, the callback was being called immediately, and so it saw the correct index value because index hadn’t had a chance to move on yet. In the above, though, we’re going to spin through the loop before the callback is called. The result? We see

Process index 3 exited
Process index 3 exited
Process index 3 exited

This is a crucial point. A closure doesn’t have a copy of the data it closes over, it has a live reference to it. So by the time the exit callback on each of those processes gets run, the loop will already be complete, so all three calls see the same index value (its value as of the end of the loop).

We can fix this by having the callback use a different variable that won’t change, like this:

var spawn = require('child_process').spawn;

var commands = [
    {cmd: 'ls', args: ['-lh', '/etc' ]},
    {cmd: 'ls', args: ['-lh', '/usr' ]},
    {cmd: 'ls', args: ['-lh', '/home']}
];
var index, command, child;

for (index = 0; index < commands.length; ++index) {
    command = commands[index];
    child = spawn(command.cmd, command.args);
    child.on('exit', makeExitCallback(index));
}

function makeExitCallback(i) {
    return function() {
        console.log("Process index " + i + " exited");
    };
}

Now we output the correct values (in whatever order the processes exit):

Process index 1 exited
Process index 2 exited
Process index 0 exited

The way that works is that the callback we assign to the exit event closes over the i argument in the call we make to makeExitCallback. The first callback that makeExitCallback creates and returns closes over the i value for that call to makeExitCallback, the second callback it creates closes over the i value for that call to makeExitCallback (which is different than the i value for the earlier call), etc.

If you give the article linked above a read, a number of things should be clearer. The terminology in the article is a bit dated (ECMAScript 5 uses updated terminology), but the concepts haven’t changed.

Leave a Comment