What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?

A concrete syntax tree represents the source text exactly in parsed form. In general, it conforms to the context-free grammar defining the source language.

However, the concrete grammar and tree have a lot of things that are necessary to make source text unambiguously parseable, but do not contribute to actual meaning. For example, to implement operator precedence, your CFG usually has several levels of expression components (term, factor, etc.), with the operators connecting them at the different levels (you add terms to get expressions, terms are composed of factors optionally multipled, etc.). To actually interpret or compile the language, however, you don’t need this; you just need Expression nodes that have operators and operands. The abstract syntax tree is the result of simplifying the concrete syntax tree down to the things actually needed to represent the meaning of the program. This tree has a much simpler definition and is thus easier to process in the later stages of execution.

You usually don’t need to actually build a concrete syntax tree. The action routines in your YACC (or Antlr, or Menhir, or whatever…) grammar can directly build the abstract syntax tree, so the concrete syntax tree only exists as a conceptual entity representing the parse structure of your source text.

Leave a Comment