Python Unicode Encode Error

Likely, your problem is that you parsed it okay, and now you’re trying to print the contents of the XML and you can’t because theres some foreign Unicode characters. Try to encode your unicode string as ascii first: unicodeData.encode(‘ascii’, ‘ignore’) the ‘ignore’ part will tell it to just skip those characters. From the python docs: … Read more

How to remove accents and turn letters into “plain” ASCII characters? [duplicate]

If you have iconv installed, try this (the example assumes your input string is in UTF-8): echo iconv(‘UTF-8’, ‘ASCII//TRANSLIT’, $string); (iconv is a library to convert between all kinds of encodings; it’s efficient and included with many PHP distributions by default. Most of all, it’s definitely easier and more error-proof than trying to roll your … Read more

Why does Python print unicode characters when the default encoding is ASCII?

Thanks to bits and pieces from various replies, I think we can stitch up an explanation. By trying to print an unicode string, u’\xe9′, Python implicitly try to encode that string using the encoding scheme currently stored in sys.stdout.encoding. Python actually picks up this setting from the environment it’s been initiated from. If it can’t … Read more

Convert Unicode to ASCII without errors in Python

>>> u’aあä’.encode(‘ascii’, ‘ignore’) ‘a’ Decode the string you get back, using either the charset in the the appropriate meta tag in the response or in the Content-Type header, then encode. The method encode(encoding, errors) accepts custom handlers for errors. The default values, besides ignore, are: >>> u’aあä’.encode(‘ascii’, ‘replace’) b’a??’ >>> u’aあä’.encode(‘ascii’, ‘xmlcharrefreplace’) b’aあä’ >>> u’aあä’.encode(‘ascii’, … Read more

How can I remove non-ASCII characters but leave periods and spaces?

You can filter all characters from the string that are not printable using string.printable, like this: >>> s = “some\x00string. with\x15 funny characters” >>> import string >>> printable = set(string.printable) >>> filter(lambda x: x in printable, s) ‘somestring. with funny characters’ string.printable on my machine contains: 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ !”#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c EDIT: On Python 3, filter will … Read more