Why does

This raises the important point that the text inside of <script> tags on an HTML page is parsed by the HTML parser before it is parsed by the Javascript parser.

This code is not valid HTML5 syntax, so there is nothing in the HTML5 specification that would give us a clue about what is going one here. To be specific, there are two issues:

Both of these problem will put a browser’s HTML parser into an error parsing mode, which means they are trying to make sense of invalid syntax. What browsers will do when trying to make sense of invalid syntax is undefined behavior, which technically means that anything can happen (such as nasal demons). The de facto behavior here seems to be that browsers are agreeing on how they handle this undefined behavior, but it is undefined behavior nonetheless.

For whatever reason, this combination of syntax issues next to each other causes browsers to ignore the text later in the document.


EDIT: I have identified how the parsing error is produced by stepping through this part of the HTML5 spec.

The text content of the script (excluding whitespace) is

var a="<!--<script>";

This must match the following grammar rule:

data1 *( escape [ script-start data3 ] "-->" data1 ) [ escape ]

We can begin parsing the text content by matching data1, which has the following rule:

data1         = < any string that doesn't contain a substring that matches not-data1 >
not-data1     = "<!--"    

That is, the string var a=" matches the data1 production. It ends there because the next part is <!--.

For there to be any text afterwards in the script, it must match the escape production, which is as follows:

escape        = "<!--" data2 *( script-start data3 script-end data2 )

Let”s match the next part of the text. So far we have

data1    var a="
escape   <!--
  data2  ???

Now nothing can be contained in data2 because the data2 production prohibits the substring <script> (i.e. a script-start)!

data2         = < any string that doesn"t contain a substring that matches not-data2 >
not-data2     = script-start / "-->"  

The lexer cannot proceed with with valid steps according to the grammar, so the browser must now go into error processing.

Leave a Comment