Use regular expression to match ANY Chinese character in utf-8 encoding

The regex to match a Chinese (well, CJK) character is

\p{script=Han}

which can be appreviated to simply

\p{Han}

This assumes that your regex compiler meets requirement RL1.2 Properties from UTS#18 Unicode Regular Expressions. Perl and Java 7 both meet that spec, but many others do not.

Leave a Comment