How to benchmark Boost Spirit Parser?

I have given things a quick scan.

My profiler quickly told me that constructing the grammar and (especially) the lexer object took quite some resources.

Indeed, just changing a single line in SpiritParser.cpp saved 40% of execution time¹ (~28s down to ~17s):

    lexer::Lexer lexer;

into

    static const lexer::Lexer lexer;

Now,

making the grammar static involves making it stateless. I made this happen by
- moving position_begin into qi::_a (using qi::locals) and
- passing it in as an inherited attribute at the appropriate times
  - the EDDIGrammar and ValueGrammar grammars, e.g.
```
start %= qi::eps [ qi::_a = qi::_r1 ] >> program;
```
  - as well as the individual rules from ValueGrammar that are being used externally).
This had a number of suboptimal side effects:
- rule debugging is commented out because the lexer::pos_iterator_type has no default output streaming overload
- the qi::position(position_begin) expression has been ‘faked’ with a rather elaborate replacement:
```
auto local_pos = qi::lazy (
        boost::phoenix::construct<qi::position>(qi::_a)
    );
```
  That doesn’t seem ideal. (Ideally, one would like to replace qi::position by a modified custom parser directive that knows how to get get begin_position from qi::locals (?) so there would be no need to invoke a parser expression lazily?)

Anyways, implementing these further changes² shaved off another ~15% of execution time:

static const lexer::Lexer lexer;
static const parser::EddiGrammar grammar(lexer); 

try {
    bool r = spirit::lex::tokenize_and_parse(
            position_begin, position_end,
            lexer, 
            grammar(boost::phoenix::cref(position_begin)),
            program);

Loose ideas:

Have you considered generating a static lexer (Generating the Static Analyzer)
Have you considered using expectation points to potentially reduce the amount of backtracking (note: I didn’t measure anything in that area)
Have you considered alternatives for Position::file and Position::theLine? Copying the strings seems heavier than necessary. I’d prefer to store const char *. You could also look at Boost Flyweight
Is the pre-skip really required inside your qi::position directive?
(Somewhat non-serious: have you considered porting to Spirit X3? It seems to promise potential benefits in the form of move semantics.)

Hope this helps.

^[1] When parsing all test cases in test/cases/*.eddi 100 times like so (github):

for (int i=0; i<100; i++)
    for (auto& fname : argv)
{
    eddic::ast::SourceFile program;
    std::cout << fname << ": " << std::boolalpha << parser.parse(fname, program, nullptr) << "\n";
}

Timed with a simple

time ./test ../../test/cases/*.eddi | md5sum

With the md5sum acting as a sanity check.

^[2] I created a pull request with the proof-of-concept refactorings here https://github.com/wichtounet/eddic/pull/52

More Related Contents:

Leave a Comment Cancel reply