Is There a Way to Match Any Unicode Alphabetic Character?

Check out Unicode character properties: http://www.regular-expressions.info/unicode.html#prop. I think what you are looking for is probably

\p{L}

which will match any letters or ideographs. You may also want to include letters with marks on them, so you could do

\p{L}\p{M}*

In any case, all the different types of character properties are detailed in the first link.

Edit: You may also want to look at this Stack Overflow answer discussing whether \w matches unicode characters. They suggest that you could also use \p{Word} or \p{Alnum}: Does \w match all alphanumeric characters defined in the Unicode standard?

Leave a Comment