PHP Regex: How to match \r and \n without using [\r\n]?

PCRE and newlines

PCRE has a superfluity of newline related escape sequences and alternatives.

Well, a nifty escape sequence that you can use here is \R. By default \R will match Unicode newlines sequences, but it can be configured using different alternatives.

To match any Unicode newline sequence that is in the ASCII range.

preg_match('~\R~', $string);

This is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85)

To match any Unicode newline sequence; including newline characters outside the ASCII range and both the line separator (U+2028) and paragraph separator (U+2029), you want to turn on the u (unicode) flag.

preg_match('~\R~u', $string);

The u (unicode) modifier turns on additional functionality of PCRE and Pattern strings are treated as (UTF-8).

The is equivalent to the following group:

(?>\r\n|\n|\r|\f|\x0b|\x85|\x{2028}|\x{2029})

It is possible to restrict \R to match CR, LF, or CRLF only:

preg_match('~(*BSR_ANYCRLF)\R~', $string);

The is equivalent to the following group:

(?>\r\n|\n|\r)

Additional

Five different conventions for indicating line breaks in strings are supported:

(*CR)        carriage return
(*LF)        linefeed
(*CRLF)      carriage return, followed by linefeed
(*ANYCRLF)   any of the three above
(*ANY)       all Unicode newline sequences

Note: \R does not have special meaning inside of a character class. Like other unrecognized escape sequences, it is treated as the literal character “R” by default.

Leave a Comment