How to make the python interpreter correctly handle non-ASCII characters in string operations?

Throw out all characters that can’t be interpreted as ASCII:

def remove_non_ascii(s):
    return "".join(c for c in s if ord(c)<128)

Keep in mind that this is guaranteed to work with the UTF-8 encoding (because all bytes in multi-byte characters have the highest bit set to 1).

Leave a Comment