pcre supports utf8 out of the box, see documentation for the ‘u’ modifier.
Illustration (\xC3\xA4 is the utf8 encoding for the german letter “ä”)
echo preg_replace('~\w~', '@', "a\xC3\xA4b");
this echoes “@@¤@” because “\xC3” and “\xA4” were treated as distinct symbols
echo preg_replace('~\w~u', '@', "a\xC3\xA4b");
(note the ‘u’) prints “@@@” because “\xC3\xA4” were treated as a single letter.