What must be wrapped in then() statements in CasperJS? How to determine execution order of sync/async functions?

Question

Rule of thumb: All CasperJS functions which contain the words then and wait are asynchronous. This statement has many exceptions.

What is `then()` doing?

CasperJS is organized as a series of steps that handle the control flow of your script. then() handles the many PhantomJS/SlimerJS event types that define the ending of a step. When then() is called, the passed function is put into a step queue which is a simply JavaScript array. If the previous step finished, either because it was a simple synchronous function or because CasperJS detected that specific events where triggered, the next step will began execution and repeat this until all steps are executed.

All those step functions are bound to the casper object, so you can refer to that object using this.

The following simple script shows two steps:

casper.start("http://example.com", function(){
    this.echo(this.getTitle());
}).run();

The first step is an implicit asynchronous (“stepped”) open() call behind start(). The start() function also takes an optional callback which itself is the second step in this script.

During the execution of the first step the page is opened. When the page is completely loaded, PhantomJS triggers the onLoadFinished event, CasperJS triggers its own events and continues with the next step. The second step is a simple completely synchronous function, so nothing fancy is happening here. When this is done, CasperJS exits, because there are no more steps to execute.

There is an exception to this rule: When a function is passed into the run() function, it will be executed as the last step instead of the default exit. If you don’t call exit() or die() in there, you will need to kill the process.

How does `then()` detect that the next step has to wait?

Take for example the following example:

casper.then(function(){
    this.echo(this.getTitle());
    this.fill(...)
    this.click("#search");
}).then(function(){
    this.echo(this.getTitle());
});

If during a step execution an event is triggered that denotes the loading of a new page, then CasperJS will wait for the page load until executing the next step. In this case a click was triggered which itself triggered a onNavigationRequested event from the underlying browser. CasperJS sees this and suspends execution using callbacks until the next page is loaded. Other types of such triggers may be form submissions or even when the client JavaScript does something like its own redirect with window.open()/window.location.

Of course, this breaks down when we are talking about single page applications (with a static URL). PhantomJS cannot detect that for example a different template is being rendered after a click and therefore cannot wait until it is finished loading (this can take some time when data is loaded from the server). If the following steps depend on the new page, you will need to use e.g. waitUntilVisible() to look for a selector that is unique to the page to be loaded.

What do you call this API style?

Some people call it Promises, because of the way steps can be chained. Aside from the name (then()) and an action chain, that’s the end of the similarities. There is no result that is passed from callback to callback through the step chain in CasperJS. Either you store your result in a global variable or add it to the casper object. Then there is only a limited error handling. When an error is encountered CasperJS will die in the default configuration.

I prefer to call it a Builder pattern, because the execution starts as soon as you call run() and every call before is only there to put steps into the queue (see 1st question). That is why it doesn’t make sense to write synchronous functions outside of step functions. Simply put, they are executed without any context. The page didn’t even began loading.

Of course this is not the whole truth by calling it a builder pattern. Steps can be nested which actually means that if you schedule a step inside of another step, it will be put into the queue after the current step and after all the other steps that where already scheduled from the current step. (That’s a lot of steps!)

The following script is a good illustration of what I mean:

casper.on("load.finished", function(){
    this.echo("1 -> 3");
});
casper.on("load.started", function(){
    this.echo("2 -> 2");
});
casper.start('http://example.com/');
casper.echo("3 -> 1");
casper.then(function() {
    this.echo("4 -> 4");
    this.then(function() {
        this.echo("5 -> 6");
        this.then(function() {
            this.echo("6 -> 8");
        });
        this.echo("7 -> 7");
    });
    this.echo("8 -> 5");
});
casper.then(function() {
    this.echo("9 -> 9");
});
casper.run();

The first number shows the position of the synchronous code snippet in the script and the second one shows the actual executed/printed position, because echo() is synchronous.

Important points:

Number 3 comes first
Number 8 is printed between 4 and 5

To avoid confusion and hard to find problems, always call asynchronous functions after the synchronous functions in a single step. If it seems impossible, split into multiple steps or consider recursion.

How does `waitFor()` work?

waitFor() is the most flexible function in the wait* family, because every other function uses this one.

waitFor() schedules in its most basic form (passing only one check function and nothing else) one step. The check function that is passed into it, is called repeatedly until the condition is met or the (global) timeout is reached. When a then and/or onTimeout step function is passed additionally, it will be called in those cases.

It is important to note that if waitFor() times out, the script will stop execution when you didn’t pass in the onTimeout callback function which is essentially an error catch function:

casper.start().waitFor(function checkCb(){
    return false;
}, function thenCb(){
    this.echo("inner then");
}, null, 1000).then(function() {
    this.echo("outer");
}).run();

What are other functions that are also asynchronous step functions?

As of 1.1-beta3 there are the following additional asynchronous functions that don’t follow the rule of thumb:

Casper module: back(), forward(), reload(), repeat(), start(), withFrame(), withPopup()
Tester module: begin()

If you’re not sure look into the source code whether a specific function uses then() or wait().

Are event listeners asynchronous?

Event listeners can be registered using casper.on(listenerName, callback) and they will be triggered using casper.emit(listenerName, values). As far as the internals of CasperJS are concerned, they are not asychronous. The asynchronous handling comes from the functions where those emit() calls lie. CasperJS passes most PhantomJS events simply through, so this is where those are asynchronous.

Can I break out of the control flow?

The control or execution flow is the way CasperJS executes the script. When we break out of the control flow, we need to manage a second flow (or even more). This will complicate the development and maintainability of the script immensely.

As example, you want to call an asynchronous function that is defined somewhere. Let’s assume that there is no way to rewrite the function in such a way, that it is synchronous.

function longRunningFunction(callback) {
    ...
    callback(data);
    ...
}
var result;
casper.start(url, function(){
    longRunningFunction(function(data){
        result = data;
    });
}).then(function(){
    this.open(urlDependsOnFunResult???);
}).then(function(){
    // do something with the dynamically opened page
}).run();

Now we have two flows which depend on one another.

Other ways to directly split the flow is by using the JavaScript functions setTimeout() and setInterval(). Since CasperJS provides waitFor(), there is no need to use those.

Can I return to the CasperJS control flow?

When a control flow must be merged back into the CasperJS flow there is an obvious solution by setting a global variable and concurrently waiting for it to be set.

Example is the same as in the previous question:

var result;
casper.start(url, function(){
    longRunningFunction(function(data){
        result = data;
    });
}).waitFor(function check(){
    return result; // `undefined` is evaluated to `false`
}, function then(){
    this.open(result.url);
}, null, 20000).then(function(){
    // do something with the dynamically opened page
}).run();

What is asynchronous in the test environment (Tester module)?

Technically, nothing is asynchronous in the tester module. Calling test.begin() simply executes the callback. Only when the callback itself uses asynchronous code (meaning test.done() is called asynchronously inside a single begin() callback), the other begin() test cases can be added to the test case queue.

That is why a single test case usually consists of a complete navigation with casper.start() and casper.run() and not the other way around:

casper.test.begin("description", function(test){
    casper.start("http://example.com").run(function(){
        test.assert(this.exists("a"), "At least one link exists");
        test.done();
    });
});

It’s best to stick to nesting a complete flow inside of begin(), since the start() and run() calls won’t be mixed between multiple flows. This enables you to use multiple complete test cases per file.

Notes:

When I talk about synchronous functions/execution, I mean a blocking call which can actually return the thing it computes.

What is then() doing?

How does then() detect that the next step has to wait?