Throw out all characters that can’t be interpreted as ASCII:
def remove_non_ascii(s):
return "".join(c for c in s if ord(c)<128)
Keep in mind that this is guaranteed to work with the UTF-8 encoding (because all bytes in multi-byte characters have the highest bit set to 1).