Regexp finding longest common prefix of two strings

If there’s some character that neither string contains —, say, \0 — you could write

"$first\0$second" =~ m/^(.*).*\0\1/s;

and the longest common prefix would be saved as $1.


Edited to add: This is obviously very inefficient. I think that if efficiency is a concern, then this simply isn’t the approach we should be using; but we can at least improve it by changing .* to [^\0]* to prevent useless greediness that will just have to be backtracked again, and wrapping the second [^\0]* in (?>…) to prevent backtracking that can’t help. This:

"$first\0$second" =~ m/^([^\0]*)(?>[^\0]*)\0\1/s;

This will yield the same result, but much more efficiently. (But still not nearly as efficiently as a straightforward non–regex-based approach. If the strings both have length n, I’d expect its worst case to take at least O(n2) time, whereas the straightforward non–regex-based approach would take O(n) time in its worst case.)

Here’s a Python one-liner:

>>> a="stackoverflow"
>>> b = 'stackofpancakes'
>>> a[:[x[0]==x[1] for x in zip(a,b)].index(0)]
0: 'stacko'
>>> a="nothing in"
>>> b = 'common'
>>> a[:[x[0]==x[1] for x in zip(a,b)].index(0)]
1: ''
>>> 

Leave a Comment