Verbs that act after backtracking and failure

Before reading this answer, you should be familiar with the mechanism of backtracking, atomic groups, and possessive quantifiers. You can find information about these notions and features in the Friedl book and following these links: www.regular-expressions.info, www.rexegg.com All the test has been made with a global search (with the preg_match_all() function). (*FAIL) (or the shorthand … Read more

Regular expression – PCRE does not support \L, \l, \N, \P,

PCRE does not support the \uXXXX syntax. Use \x{XXXX} instead. See here. Your \u2e80-\u9fff range is also equivalent to \p{InCJK_Radicals_Supplement}\p{InKangxi_Radicals}\p{InIdeographic_Description_Characters}\p{InCJK_Symbols_and_Punctuation}\p{InHiragana}\p{InKatakana}\p{InBopomofo}\p{InHangul_Compatibility_Jamo}\p{InKanbun}\p{InBopomofo_Extended}\p{InKatakana_Phonetic_Extensions}\p{InEnclosed_CJK_Letters_and_Months}\p{InCJK_Compatibility}\p{InCJK_Unified_Ideographs_Extension_A}\p{InYijing_Hexagram_Symbols}\p{InCJK_Unified_Ideographs} Don’t forget to add the u modifier (/regex here/u) if you’re dealing with UTF-8. If you’re dealing with another multi-byte encoding, you must first convert it to UTF-8.

php regex to match outside of html tags

You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length: /(asf|foo|barr)(?=[^>]*(<|$))/ See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.

preg_match and UTF-8 in PHP

Although the u modifier makes both the pattern and subject be interpreted as UTF-8, the captured offsets are still counted in bytes. You can use mb_strlen to get the length in UTF-8 characters rather than bytes: $str = “\xC2\xA1Hola!”; preg_match(‘/H/u’, $str, $a_matches, PREG_OFFSET_CAPTURE); echo mb_strlen(substr($str, 0, $a_matches[0][1]));

How can I convert ereg expressions to preg in PHP?

The biggest change in the syntax is the addition of delimiters. ereg(‘^hello’, $str); preg_match(‘/^hello/’, $str); Delimiters can be pretty much anything that is not alpha-numeric, a backslash or a whitespace character. The most used are generally ~, / and #. You can also use matching brackets: preg_match(‘[^hello]’, $str); preg_match(‘(^hello)’, $str); preg_match(‘{^hello}’, $str); // etc If … Read more