Regular Expressions with repeated characters

Usually for this type of question, if the regex is not simple enough to be derived directly, you can start from drawing a DFA and derive a regex from there.

You should be able to derive the following DFA. q1, q2, q3, q4 are end states, with q1 also being the start state. q5 is the failed/trap state.

DFA

There are several methods to find Regular Expression for a DFA. I am going to use Brzozowski Algebraic Method as explained in section 5 of this paper:

For each state qi, the equation Ri is a union of terms: for a transition a from qi to qj, the term is aRj. Basically, you will look at all the outgoing edges from a state. If Ri is a final state, λ is also one of the terms.

Let me quote the identities from the definition section of the paper, since they will come in handy later (λ is the empty string and ∅ is the empty set):

(ab)c = a(bc) = abc
λx = xλ = x
∅x = x∅ = ∅
∅ + x = x
λ + x* = x*
(λ + x)* = x*

Since q5 is a trap state, the formula will end up an infinite recursion, so you can drop it in the equations. It will end up as empty set and disappear if you include it in the equation anyway (explained in the appendix).

You will come up with:

R1 = xR2 + yR3 + zR4 + λ
R2 =     + yR3 + zR4 + λ
R3 = xR2 +     + zR4 + λ
R4 = xR2 + yR3       + λ

Solve the equation above with substitution and Arden’s theorem, which states:

Given an equation of the form X = AX + B where λ ∉ A, the equation has the solution X = A*B.

You will get to the answer.

I don’t have time and confidence to derive the whole thing, but I will show the first few steps of derivation.

Remove R4 by substitution, note that zλ becomes z due to the identity:

R1 = xR2 + yR3 + (zxR2 + zyR3 + z) + λ
R2 =     + yR3 + (zxR2 + zyR3 + z) + λ
R3 = xR2 +     + (zxR2 + zyR3 + z) + λ

Regroup them:

R1 = (x + zx)R2 + (y + zy)R3 + z + λ
R2 =       zxR2 + (y + zy)R3 + z + λ
R3 = (x + zx)R2 +       zyR3 + z + λ

Apply Arden’s theorem to R3:

R3 = (zy)*((x + zx)R2 + z + λ)
   = (zy)*(x + zx)R2 + (zy)*z + (zy)*

You can substitute R3 back to R2 and R1 and remove R3. I leave the rest as exercise. Continue ahead and you should reach the answer.

Appendix

We will explain why trap states can be discarded from the equations, since they will just disappear anyway. Let us use the state q5 in the DFA as an example here.

R5 = (x + y + z)R5

Use identity ∅ + x = x:

R5 = (x + y + z)R5 + ∅

Apply Arden’s theorem to R5:

R5 = (x + y + z)*∅

Use identity ∅x = x∅ = ∅:

R5 = ∅

The identity ∅x = x∅ = ∅ will also take effect when R5 is substituted into other equations, causing the term with R5 to disappear.

Leave a Comment