Why can’t I use accented characters next to a word boundary?

JavaScript’s regex implementation is not Unicode-aware. It only knows the ‘word characters’ in standard low-byte ASCII, which does not include é or any other accented or non-English letters.

Because é is not a word character to JS, é followed by a space can never be considered a word boundary. (It would match \b if used in the middle of a word, like Namés.)

/([\s.,!?])(fancy namé|namé)([\s.,!?]|$)/

Yeah, that would be the usual workaround for JS (though probably with more punctuation characters). For other languages you’d generally use lookahead/lookbehind to avoid matching the pre and post boundary characters, but these are poorly supported/buggy in JS so best avoided.

Leave a Comment