Is it possible for a computer to “learn” a regular expression by user-provided examples?

Yes,
it is possible,
we can generate regexes from examples (text -> desired extractions).
This is a working online tool which does the job: http://regex.inginf.units.it/

Regex Generator++ online tool generates a regex from provided examples using a GP search algorithm.
The GP algorithm is driven by a multiobjective fitness which leads to higher performance and simpler solution structure (Occam’s Razor).
This tool is a demostrative application by the Machine Lerning Lab, Trieste Univeristy (Università degli studi di Trieste).
Please look at the video tutorial here.

This is a research project so you can read about used algorithms here.

Behold! 🙂

Finding a meaningful regex/solution from examples is possible if and only if the provided examples describe the problem well.
Consider these examples that describe an extraction task, we are looking for particular item codes; the examples are text/extraction pairs:

"The product code is 467-345A" -> "467-345A"
"The item 789-345B is broken"  -> "789-345B"

An (human) guy, looking at the examples, may say: “the item codes are things like \d++-345[AB]”

When the item code is more permissive but we have not provided other examples, we have not proofs to understand the problem well.
When applying the human generated solution \d++-345[AB] to the following text, it fails:

"On the back of the item there is a code: 966-347Z"

You have to provide other examples, in order to better describe what is a match and what is not a desired match:
–i.e:

"My phone is +39-128-3905 , and the phone product id is 966-347Z" -> "966-347Z"

The phone number is not a product id, this may be an important proof.

Leave a Comment