UTF-8 in PHP regular expressions [duplicate]

Updated answer:
This is now tested and working

$post="9999, škofja loka";
echo preg_match('/^\\d{4},[\\s\\p{L}]+$/u', $post);

\\w will not work, because it does not contain all unicode letters and contains also [0-9_] additionally to the letters.

Important is also the u modifier to activate the unicode mode.

If there can be letters or whitespace after the comma then you should put those into the same character class, in your regex there are 0 or more whitespace after the comma and then there are only letters.

See http://www.regular-expressions.info/php.html for php regex details

The \\p{L} (Unicode letter) is explained here

Important is also the use of the end of string boundary $ to ensure that really the complete string is verified, otherwise it will match only the first whitespace and ignore the rest for example.

Leave a Comment