SQL doesnt differentiate u and ü although collation is utf8mb4_unicode_ci

Collation and character set are two different things. Character set is just an ‘unordered’ list of characters and their representation. utf8mb4 is a character set and covers a lots of characters. Collation defines the order of characters (determines the end result of order by for example) and defines other rules (such as which characters or … Read more

is PHP str_word_count() multibyte safe?

I’d say you guess right. And indeed there are space characters in UTF-8 which are not part of US-ASCII. To give you an example of such spaces: Unicode Character ‘NO-BREAK SPACE’ (U+00A0): 2 Bytes in UTF-8: 0xC2 0xA0 (c2a0) And perhaps as well: Unicode Character ‘NEXT LINE (NEL)’ (U+0085): 2 Bytes in UTF-8: 0xC2 0x85 … Read more

Android WebView with garbled UTF-8 characters.

You can try to edit the settings of your webview before you load the data: WebSettings settings = mWebView.getSettings(); settings.setDefaultTextEncodingName(“utf-8”); Also, as provided in the comment below, be sure to add “charset=utf-8” to the loadData call: mWebView.loadData(getString(R.string.info_texto), “text/html; charset=utf-8”, “utf-8”);

Which encoding opens CSV files correctly with Excel on both Mac and Windows?

Excel Encodings I found the WINDOWS-1252 encoding to be the least frustrating when dealing with Excel. Since its basically Microsofts own proprietary character set, one can assume it will work on both the Mac and the Windows version of MS-Excel. Both versions at least include a corresponding “File origin” or “File encoding” selector which correctly … Read more

Unicode, UTF, ASCII, ANSI format differences

Going down your list: “Unicode” isn’t an encoding, although unfortunately, a lot of documentation imprecisely uses it to refer to whichever Unicode encoding that particular system uses by default. On Windows and Java, this often means UTF-16; in many other places, it means UTF-8. Properly, Unicode refers to the abstract character set itself, not to … Read more

UTF-8, UTF-16, and UTF-32

UTF-8 has an advantage in the case where ASCII characters represent the majority of characters in a block of text, because UTF-8 encodes these into 8 bits (like ASCII). It is also advantageous in that a UTF-8 file containing only ASCII characters has the same encoding as an ASCII file. UTF-16 is better where ASCII … Read more