How to match Cyrillic characters with a regular expression

If your regex flavor supports Unicode blocks ([\p{IsCyrillic}]), you can match Cyrillic characters with: [\p{IsCyrillic}] or [\p{Cyrillic}] Otherwise try using: [U+0400–U+04FF] For PHP use: [\x{0400}-\x{04FF}] Explanation: [\p{IsCyrillic}] Match a character from the Unicode block “Cyrillic” (U+0400–U+04FF) «[\p{IsCyrillic}]» Note: Unicode Characters list and Numeric HTML Entities of [U+0400–U+04FF] .

Python and regular expression with Unicode

Are you using python 2.x or 3.0? If you’re using 2.x, try making the regex string a unicode-escape string, with ‘u’. Since it’s regex it’s good practice to make your regex string a raw string, with ‘r’. Also, putting your entire pattern in parentheses is superfluous. re.sub(ur'[\u064B-\u0652\u06D4\u0670\u0674\u06D5-\u06ED]+’, ”, …) http://docs.python.org/tutorial/introduction.html#unicode-strings Edit: It’s also good practice … Read more

How can I use Unicode-aware regular expressions in JavaScript?

Situation for ES 6 The ECMAScript language specification, edition 6 (also commonly known as ES2015), includes Unicode-aware regular expressions. Support must be enabled with the u modifier on the regex. See Unicode-aware regular expressions in ES6 for a break-down of the feature and some caveats. ES6 is widely adopted in both browsers and stand-alone Javascript … Read more