UnicodeEncodeError: ‘charmap’ codec can’t encode – character maps to , print function [duplicate]

I see three solutions to this:

  1. Change the output encoding, so it will always output UTF-8. See e.g. Setting the correct encoding when piping stdout in Python, but I could not get these example to work.

  2. Following example code makes the output aware of your target charset.

    # -*- coding: utf-8 -*-
    import sys
    
    print sys.stdout.encoding
    print u"Stöcker".encode(sys.stdout.encoding, errors="replace")
    print u"Стоескер".encode(sys.stdout.encoding, errors="replace")
    

    This example properly replaces any non-printable character in my name with a question mark.

    If you create a custom print function, e.g. called myprint, using that mechanisms to encode output properly you can simply replace print with myprint whereever necessary without making the whole code look ugly.

  3. Reset the output encoding globally at the begin of the software:

    The page http://www.macfreek.nl/memory/Encoding_of_Python_stdout has a good summary what to do to change output encoding. Especially the section “StreamWriter Wrapper around Stdout” is interesting. Essentially it says to change the I/O encoding function like this:

    In Python 2:

    if sys.stdout.encoding != 'cp850':
      sys.stdout = codecs.getwriter('cp850')(sys.stdout, 'strict')
    if sys.stderr.encoding != 'cp850':
      sys.stderr = codecs.getwriter('cp850')(sys.stderr, 'strict')
    

    In Python 3:

    if sys.stdout.encoding != 'cp850':
      sys.stdout = codecs.getwriter('cp850')(sys.stdout.buffer, 'strict')
    if sys.stderr.encoding != 'cp850':
      sys.stderr = codecs.getwriter('cp850')(sys.stderr.buffer, 'strict')
    

    If used in CGI outputting HTML you can replace ‘strict’ by ‘xmlcharrefreplace’ to get HTML encoded tags for non-printable characters.

    Feel free to modify the approaches, setting different encodings, …. Note that it still wont work to output non-specified data. So any data, input, texts must be correctly convertable into unicode:

    # -*- coding: utf-8 -*-
    import sys
    import codecs
    sys.stdout = codecs.getwriter("iso-8859-1")(sys.stdout, 'xmlcharrefreplace')
    print u"Stöcker"                # works
    print "Stöcker".decode("utf-8") # works
    print "Stöcker"                 # fails
    

Leave a Comment