How to remove invalid UTF-8 characters from a JavaScript string?

I use this simple and sturdy approach:

function cleanString(input) {
    var output = "";
    for (var i=0; i<input.length; i++) {
        if (input.charCodeAt(i) <= 127) {
            output += input.charAt(i);
        }
    }
    return output;
}

Basically all you really want are the ASCII chars 0-127 so just rebuild the string char by char. If it’s a good char, keep it – if not, ditch it. Pretty robust and if if sanitation is your goal, it’s fast enough (in fact it’s really fast).

Leave a Comment