character-encoding - w3toppers.com

Stylesheet taken-over/replaced by Chinese characters

So I think I figured it out. This is weird. But anyway. I copied and pasted your HTML to a local file to experiment with. And it loaded just fine. It was saved as UTF-8. Then I changed it to UTF-16, and I got exactly what you’re seeing! As far as can tell, the browser … Read more

How to change ajax-charset?

You could use: contentType:”application/x-javascript; charset:ISO-8859-1″

How do I change a shell scripts character encoding?

Slowly, the Unix world is moving from ASCII and other regional encodings to UTF-8. You need to be running a UTF terminal, such as a modern xterm or putty. In your ~/.bash_profile set you language to be one of the UTF-8 variants. export LANG=C.UTF-8 or export LANG=en_AU.UTF-8 etc.. You should then be able to write … Read more

MySQL distinction between e and é (e acute) – UNIQUE index

and collation is “utf8_general_ci”. And that’s the answer. If you’re using utf8_general_ci (actually it applies to all utf_…_[ci|cs]) collation then diacritics are bypassed in comarison, thus: SELECT “e” = “é” AND “O” = “Ó” AND “ä” = “a” Results in 1. Indexes also use collation. If you want to distinguish between ą and a then … Read more

What is the most common encoding of each language?

On the web, UTF-8 is by far the most common encoding for all languages. That being said, here are the Windows XP locales grouped by default character encoding (“Language for non-Unicode programs“): Big5: zh_HK, zh_MO, zh_TW GBK (≈GB2312): zh_CN, zh_SG Windows-31J (≈Shift_JIS): ja_JP windows-874 (≈TIS-620, ISO-8859-11): th_TH windows-949 (≈EUC-KR): ko_KR windows-1250: bs_BA, cs_CZ, hr_BA, hr_HR, … Read more

Platform’s default charset on different platforms?

That’s a user specific setting. On many modern Linux systems, it’s UTF-8. On Macs, itâs MacRoman. In the US on Windows, it’s often CP1250, in Europe it’s CP1252. In China, you often find simplified chinese (Big5 or a GB*). But thatâs the system default, which each user can change at any time. Which is probably … Read more

Should we HTML-encode special characters before storing them in the database?

Don’t HTML-encode your characters before storage. You should store as pure a form of your data as possible. HTML encoding is needed because you are going to display the data on an HTML page, so do the encoding during the processing of the data to create the page. For example, suppose you decide you’re also … Read more

What is the most efficient binary to text encoding?

This really depends on the nature of the binary data, and the constraints that “text” places on your output. First off, if your binary data is not compressed, try compressing before encoding. We can then assume that the distribution of 1/0 or individual bytes is more or less random. Now: why do you need text? … Read more

How can I be sure of the file encoding?

$ file –mime my.txt my.txt: text/plain; charset=iso-8859-1

UnicodeEncodeError: ‘ascii’ codec can’t encode character u’\u2013′ in position 3 2: ordinal not in range(128)

You can print Unicode objects as well, you don’t need to do str() around it. Assuming you really want a str: When you do str(u’\u2013′) you are trying to convert the Unicode string to a 8-bit string. To do this you need to use an encoding, a mapping between Unicode data to 8-bit data. What … Read more