MySQL distinction between e and é (e acute) – UNIQUE index

and collation is “utf8_general_ci”. And that’s the answer. If you’re using utf8_general_ci (actually it applies to all utf_…_[ci|cs]) collation then diacritics are bypassed in comarison, thus: SELECT “e” = “é” AND “O” = “Ó” AND “ä” = “a” Results in 1. Indexes also use collation. If you want to distinguish between ą and a then … Read more

What is the most common encoding of each language?

On the web, UTF-8 is by far the most common encoding for all languages. That being said, here are the Windows XP locales grouped by default character encoding (“Language for non-Unicode programs“): Big5: zh_HK, zh_MO, zh_TW GBK (≈GB2312): zh_CN, zh_SG Windows-31J (≈Shift_JIS): ja_JP windows-874 (≈TIS-620, ISO-8859-11): th_TH windows-949 (≈EUC-KR): ko_KR windows-1250: bs_BA, cs_CZ, hr_BA, hr_HR, … Read more