Why is a character class faster than alternation?

This is because the “OR” construct | backtracks between the alternation: If the first alternation is not matched, the engine has to return before the pointer location moved during the match of the alternation, to continue matching the next alternation; Whereas the character class can advance sequentially. See this match on a regex engine with optimizations disabled:

Pattern: (r|f)at
Match string: carat

alternations

Pattern: [rf]at
Match string: carat

class


But to be short, the fact that engine optimizes this (single literal characters -> character class) away is already a decent hint that alternations are inefficient.

Leave a Comment