How to tell if one regular expression matches a subset of another regular expression?

I think — in theory — to tell whether regexp A matches a subset of what regexp B matches, an algorithm could:

  1. Compute the minimal Deterministic Finite Automaton of B and also of the “union” A|B.
  2. Check if the two DFAs are identical. This is true if and only if A matches a subset of what B matches.

However, it would likely be a major project to do this in practice. There are explanations such as Constructing a minimum-state DFA from a Regular Expression but they only tend to consider mathematically pure regexps. You would also have to handle the extensions that Python adds for convenience. Moreover, if any of the extensions cause the language to be non-regular (I am not sure if this is the case) you might not be able to handle those ones.

But what are you trying to do? Perhaps there’s an easier approach…?

Leave a Comment