How to match accented characters with a regex?

Instead of \w, use the POSIX bracket expression [:alpha:]:

"blåbær dèjá vu".scan /[[:alpha:]]+/  # => ["blåbær", "dèjá", "vu"]

"blåbær dèjá vu".scan /\w+/  # => ["bl", "b", "r", "d", "j", "vu"]

In your particular case, change the regex to this:

NAME_REGEX = /^[[:alpha:]\s'"\-_&@!?()\[\]-]*$/u

This does match much more than just accented characters, though. Which is a
good thing. Make sure you read this blog entry about common misconceptions
regarding names in software applications.

Leave a Comment