UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128) [duplicate]

Python is trying to be helpful. You cannot decode Unicode data, it is already decoded. So Python first will encode the data (using the ASCII codec) to get bytes to decode. It is this implicit encoding that fails.

If you have Unicode data, it only makes sense to encode to UTF-8, not decode:

>>> print u'\u041e\u043b\u044c\u0433\u0430'
Ольга
>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8')
'\xd0\x9e\xd0\xbb\xd1\x8c\xd0\xb3\xd0\xb0'

If you wanted a Unicode value, then using a Unicode literal (u'...') is all you needed to do. No further decoding is necessary.

The same implicit conversion takes place in the other direction; if you tried to encode a bytestring you’d trigger an implicit decoding:

>>> u'\u041e\u043b\u044c\u0433\u0430'.encode('utf8').encode('utf8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0: ordinal not in range(128)

UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 0-5: ordinal not in range(128) [duplicate]

Leave a Comment Cancel reply

More Related Contents:

Leave a Comment Cancel reply