lexer - w3toppers.com

ANTLR What is simpliest way to realize python like indent-depending grammar?

I don’t know what the easiest way to handle it is, but the following is a relatively easy way. Whenever you match a line break in your lexer, optionally match one or more spaces. If there are spaces after the line break, compare the length of these spaces with the current indent-size. If it’s more … Read more

Is it a Lexer’s Job to Parse Numbers and Strings?

The simple answer is “Yes”. In the abstract, you don’t need lexers at all. You could simply write a grammer that used individual characters as tokens (and in fact that’s exactly what SGLR parsers do, but that’s a story for another day). You need lexers because parsers built using characters as primitive elements aren’t as … Read more

When parsing Javascript, what determines the meaning of a slash?

It’s actually fairly easy, but it requires making your lexer a little smarter than usual. The division operator must follow an expression, and a regular expression literal can’t follow an expression, so in all other cases you can safely assume you’re looking at a regular expression literal. You already have to identify Punctuators as multiple-character … Read more

Lexer written in Javascript? [closed]

Something like http://jscc.phorward-software.com/, maybe? JS/CC is the first available parser development system for JavaScript and ECMAScript-derivates. It has been developed, both, with the intention of building a productive compiler development system and with the intention of creating an easy-to-use academic environment for people interested in how parse table generation is done general in bottom-up parsing. … Read more

Looking for a clear definition of what a “tokenizer”, “parser” and “lexers” are and how they are related to each other and used?

A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context to the tokens — this token is a number, that token is a string literal, this other token is an equality operator. A parser takes … Read more

How does the ANTLR lexer disambiguate its rules (or why does my parser produce “mismatched input” errors)?

In ANTLR, the lexer is isolated from the parser, which means it will split the text into typed tokens according to the lexer grammar rules, and the parser has no influence on this process (it cannot say “give me an INTEGER now” for instance). It produces a token stream by itself. Furthermore, the parser doesn’t … Read more

Poor man’s “lexer” for C#

The original version I posted here as an answer had a problem in that it only worked while there was more than one “Regex” that matched the current expression. That is, as soon as only one Regex matched, it would return a token – whereas most people want the Regex to be “greedy”. This was … Read more

lexers vs parsers

What parsers and lexers have in common: They read symbols of some alphabet from their input. Hint: The alphabet doesn’t necessarily have to be of letters. But it has to be of symbols which are atomic for the language understood by parser/lexer. Symbols for the lexer: ASCII characters. Symbols for the parser: the particular tokens, … Read more