How to write a Parser in C#? [closed]

I have implemented several parsers in C# – hand-written and tool generated.

A very good introductory tutorial on parsing in general is Let’s Build a Compiler – it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.

You should look into some tools to generate the code for you – if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better – and have easier definitions (e.g. TDOP parsing or Monadic Parsing).

On the topic of whether C# is up for the task – C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won’t comment too much on JITted code because it can get quite religious – however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.

Side Note: Markup parsers are completely different beasts when compared to language parsers – they are, in the majority of the cases, written by hand – and at the scanner/parser level very simple; they are not usually recursive descent – and especially in the case of XML it is better if you don’t write a recursive descent parser (to avoid stack overflows, and because a ‘flat’ parser can be used in SAX/push mode).

Leave a Comment