Verbs that act after backtracking and failure

Before reading this answer, you should be familiar with the mechanism of backtracking, atomic groups, and possessive quantifiers. You can find information about these notions and features in the Friedl book and following these links: www.regular-expressions.info, www.rexegg.com

All the test has been made with a global search (with the preg_match_all() function).

(*FAIL) (or the shorthand (*F))

baabo caaco daado

caac(*FAIL)|aa.|caaco|co

[0] => aab
[1] => caaco
[2] => aad

(*FAIL) causes exactly the same behavior as a “bad character” in a pattern. If you replace it with an “R”, for example, you obtain exactly the same result: caacR|aa.|caaco|co. To be more general, you can indeed replace (*FAIL) with an “always failing subpattern” like: (?!), (?=a(?<!a)),…

a (first from "baabo") : Without surprise, the first result is found by the second alternative. (aab)

c (first) : The regex engine encounters the first “c” and tries the first alternative and finds: caac, but the subpattern is forced to fail. Then the regex engine (always from the first “c”) tries the second alternative that fails, the third alternative succeeds. (caaco)

a (first from "daado") : the third result is found by the second alternative. (aad)

(*SKIP)

baabo caaco daado

caa(*SKIP)c(*FAIL)|aa.|caaco|co

[0] => aab
[1] => co
[2] => aad

This verb defines a point beyond which the regex engine is not allowed to backtrack when the subpattern fails later. Consequently, all the characters found before with the subpattern are consumed, once and for all, and can not be used for another part of the pattern (alternative).

a (first from "baabo") : the first result is found by the second alternative. (aab)

c (first) : the regex engine finds caac as in the first case, then fails (cause of the (*FAIL) verb), backtracks to the second “c” but is not allowed to backtrack to characters that are previously matched (“caa”) before the (*SKIP) verb.
c (second) : Now, the regex engine tries always the first alternative but in this new position and fails since there is an “o” and not an “a” after, then it backtracks to this second “c”. Note that in this case, these characters are not consumed as previously because the subpattern has failed before to have reached the (*SKIP) verb.
The second alternative is tested and fails (doesn’t begin with a “c”). The third alternative fails too because the next character is an “o” and not an “a”.
The fourth alternative succeeds and gives the second result. (co)

a (first from "daado") : the third result is found by the second alternative. (aad)

(*PRUNE)

baabo caaco daado

caa(*PRUNE)c(*FAIL)|aa.|caaco|co

[0] => aab
[1] => aac
[2] => aad

This verb is different from (*SKIP) because it doesn’t forbid using all the previous matched characters, but skips the first matched character by the subpattern (or forbids a subpattern to start with) if the subpattern will fail later.

a (first from "baabo") : the first result is found by the second alternative. (aab)

c (first) : the regex engine finds caac as in the first case, then fails, but now backtracks to the first “a” from “caaco” since the first “c” is skipped.
a (first from "caaco") : the first alternative is tried and fails, the second succeeds and gives the second result. (aac)

a (first from "daado") : the third result is found by the second alternative. (aad)

Leave a Comment