Python 3 chokes on CP-1252/ANSI reading

Position 0x81 is unassigned in Windows-1252 (aka cp1252). It is assigned to U+0081 HIGH OCTET PRESET (HOP) control character in Latin-1 (aka ISO 8859-1). I can reproduce your error in Python 3.1 like this: >>> b’\x81′.decode(‘cp1252’) Traceback (most recent call last): … UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 0: character maps to … Read more

Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

You need to get the proper Encoding object. ASCII is just as it’s named: ASCII, meaning that it only supports 7-bit ASCII characters. If what you want to do is convert files, then this is likely easier than dealing with the byte arrays directly. using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName, Encoding.GetEncoding(“iso-8859-1”))) { using (System.IO.StreamWriter writer … Read more