latin1 - w3toppers.com

Python 3 chokes on CP-1252/ANSI reading

Position 0x81 is unassigned in Windows-1252 (aka cp1252). It is assigned to U+0081 HIGH OCTET PRESET (HOP) control character in Latin-1 (aka ISO 8859-1). I can reproduce your error in Python 3.1 like this: >>> b’\x81′.decode(‘cp1252’) Traceback (most recent call last): … UnicodeDecodeError: ‘charmap’ codec can’t decode byte 0x81 in position 0: character maps to … Read more

Using .NET how to convert ISO 8859-1 encoded text files that contain Latin-1 accented characters to UTF-8

You need to get the proper Encoding object. ASCII is just as it’s named: ASCII, meaning that it only supports 7-bit ASCII characters. If what you want to do is convert files, then this is likely easier than dealing with the byte arrays directly. using (System.IO.StreamReader reader = new System.IO.StreamReader(fileName, Encoding.GetEncoding(“iso-8859-1”))) { using (System.IO.StreamWriter writer … Read more

How to detect UTF-8 characters in a Latin1 encoded column – MySQL

Character encoding, like time zones, is a constant source of problems. What you can do is look for any “high-ASCII” characters as these are either LATIN1 accented characters or symbols, or the first of a UTF-8 multi-byte character. Telling the difference isn’t going to be easy unless you cheat a bit. To figure out what … Read more

Differences between utf8 and latin1

UTF-8 is prepared for world domination, Latin1 isn’t. If you’re trying to store non-Latin characters like Chinese, Japanese, Hebrew, Russian, etc using Latin1 encoding, then they will end up as mojibake. You may find the introductory text of this article useful (and even more if you know a bit Java). Note that full 4-byte UTF-8 … Read more