To use or not to use regular expressions?

Regular expressions are a text processing tool for character-based tests. More formally, regular expressions are good at handling regular languages and bad at almost anything else.

In practice, this means that regular expressions are not well suited for tasks that require discovering meaning (semantics) in text that goes beyond the character level. This would require a full-blown parser.

In your particular case: recognizing a number in a text is an exercise that regular expressions are good at (decimal numbers can be trivially described using a regular language). This works on the character level.

But doing more advanced stuff with the number that requires knowledge of its numerical value (i.e. its semantics) requires interpretation. Regular expressions are bad at this. So finding a number in text is easy. Finding a number in text that is greater than 11 but smaller than 1004 (or that is divisible by 3) is hard: it requires recognizing the meaning of the number.

Leave a Comment