How to check that a string is a palindrome using regular expressions?

The answer to this question is that “it is impossible”. More specifically, the interviewer is wondering if you paid attention in your computational theory class.

In your computational theory class you learned about finite state machines. A finite state machine is composed of nodes and edges. Each edge is annotated with a letter from a finite alphabet. One or more nodes are special “accepting” nodes and one node is the “start” node. As each letter is read from a given word we traverse the given edge in the machine. If we end up in an accepting state then we say that the machine “accepts” that word.

A regular expression can always be translated into an equivalent finite state machine. That is, one that accepts and rejects the same words as the regular expression (in the real world, some regexp languages allow for arbitrary functions, these don’t count).

It is impossible to build a finite state machine that accepts all palindromes. The proof relies on the facts that we can easily build a string that requires an arbitrarily large number of nodes, namely the string

a^x b a^x (eg., aba, aabaa, aaabaaa, aaaabaaaa, ….)

where a^x is a repeated x times. This requires at least x nodes because, after seeing the ‘b’ we have to count back x times to make sure it is a palindrome.

Finally, getting back to the original question, you could tell the interviewer that you can write a regular expression that accepts all palindromes that are smaller than some finite fixed length. If there is ever a real-world application that requires identifying palindromes then it will almost certainly not include arbitrarily long ones, thus this answer would show that you can differentiate theoretical impossibilities from real-world applications. Still, the actual regexp would be quite long, much longer than equivalent 4-line program (easy exercise for the reader: write a program that identifies palindromes).

Leave a Comment