Why does this bison code produce unexpected output?

Short answer:

yylval.strval = yytext;

You can’t use yytext like that. The string it points to is private to the lexer and will change as soon as the flex action finishes. You need to do something like:

yylval.strval = strdup(yytext);

and then you need to make sure you free the memory afterwards.


Longer answer:

yytext is actually a pointer into the buffer containing the input. In order to make yytext work as though it were a NUL-terminated string, the flex framework overwrites the character following the token with a NUL before it does the action, and then replaces the original character when the action terminates. So strdup will work fine inside the action, but outside the action (in your bison code), you now have a pointer to the part of the buffer starting with the token. And it gets worse later, since flex will read the next part of the source into the same buffer, and now your pointer is to random garbage. There are several possible scenarios, depending on flex options, but none of them are pretty.

So the golden rule: yytext is only valid until the end of the action. If you want to keep it, copy it, and then make sure you free the storage for the copy when you no longer need it.

In almost all the lexers I’ve written, the ID token actually finds the identifier in a symbol table (or puts it there) and returns a pointer into the symbol table, which simplifies memory management. But you still have essentially the same memory management issue with, for example, character string literals.

Leave a Comment