There appears to be a problem with Regex and the word boundary \b
matching the beginning of a string with a starting character out of the normal 256 byte range.
Instead of using \b
, try using (?:^|\\s)
var title = "this is simple string with finnish word tämä on ääkköstesti älkää ihmetelkö";
// Does not work
var searchterm = "äl";
// does not work
//var searchterm = "ää";
// Works
//var searchterm = "wi";
if ( new RegExp("(?:^|\\s)"+searchterm, "gi").test(title) ) {
$("#result").html("Match: ("+searchterm+"): "+title);
} else {
$("#result").html("nothing found with term: "+searchterm);
}
Breakdown:
(?:
parenthesis ()
form a capture group in Regex. Parenthesis started with a question mark and colon ?:
form a non-capturing group. They just group the terms together
^
the caret symbol matches the beginning of a string
|
the bar is the “or” operator.
\s
matches whitespace (appears as \\s
in the string because we have to escape the backslash)
)
closes the group
So instead of using \b
, which matches word boundaries and doesn’t work for unicode characters, we use a non-capturing group which matches the beginning of a string OR whitespace.