I see three solutions to this:
-
Change the output encoding, so it will always output UTF-8. See e.g. Setting the correct encoding when piping stdout in Python, but I could not get these example to work.
-
Following example code makes the output aware of your target charset.
# -*- coding: utf-8 -*- import sys print sys.stdout.encoding print u"Stöcker".encode(sys.stdout.encoding, errors="replace") print u"Стоескер".encode(sys.stdout.encoding, errors="replace")
This example properly replaces any non-printable character in my name with a question mark.
If you create a custom print function, e.g. called
myprint
, using that mechanisms to encode output properly you can simply replace print withmyprint
whereever necessary without making the whole code look ugly. -
Reset the output encoding globally at the begin of the software:
The page http://www.macfreek.nl/memory/Encoding_of_Python_stdout has a good summary what to do to change output encoding. Especially the section “StreamWriter Wrapper around Stdout” is interesting. Essentially it says to change the I/O encoding function like this:
In Python 2:
if sys.stdout.encoding != 'cp850': sys.stdout = codecs.getwriter('cp850')(sys.stdout, 'strict') if sys.stderr.encoding != 'cp850': sys.stderr = codecs.getwriter('cp850')(sys.stderr, 'strict')
In Python 3:
if sys.stdout.encoding != 'cp850': sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict') if sys.stderr.encoding != 'cp850': sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')
If used in CGI outputting HTML you can replace ‘strict’ by ‘xmlcharrefreplace’ to get HTML encoded tags for non-printable characters.
Feel free to modify the approaches, setting different encodings, …. Note that it still wont work to output non-specified data. So any data, input, texts must be correctly convertable into unicode:
# -*- coding: utf-8 -*- import sys import codecs sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace') print u"Stöcker" # works print "Stöcker".decode("utf-8") # works print "Stöcker" # fails