The ultimate emoji encoding scheme

MySQL’s utf8 charset is not actually UTF-8, it’s a subset of UTF-8 only supporting the basic plane (characters up to U+FFFF). Most emoji use code points higher than U+FFFF. MySQL’s utf8mb4 is actual UTF-8 which can encode all those code points. Outside of MySQL there’s no such thing as “utf8mb4”, there’s just UTF-8. So: Does … Read more

SQL doesnt differentiate u and ü although collation is utf8mb4_unicode_ci

Collation and character set are two different things. Character set is just an ‘unordered’ list of characters and their representation. utf8mb4 is a character set and covers a lots of characters. Collation defines the order of characters (determines the end result of order by for example) and defines other rules (such as which characters or … Read more

iPhone emoticons insert into MySQL but become blank value

Most iOS emojis use code points above the Basic Multilingual Plane of the Unicode table. For example, 😄 (SMILING FACE WITH OPEN MOUTH AND SMILING EYES) is at U+1F604. Now, see http://dev.mysql.com/doc/refman/5.5/en/charset-unicode.html. MySQL before version 5.5 only supports UTF-8 for the BMP, which includes characters between U+0000 and U+FFFF (i.e. only a subset of actual … Read more

MySQL utf8mb4, Errors when saving Emojis

character_set_client, _connection, and _results must all be utf8mb4 for that shortcake to be eatable. Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin’s settings — something is not setting all three. If SET NAMES utf8mb4 is executed, all three set correctly. The sun shone because it is only 3-bytes – … Read more

What is the difference between utf8mb4 and utf8 charsets in MySQL?

UTF-8 is a variable-length encoding. In the case of UTF-8, this means that storing one code point requires one to four bytes. However, MySQL’s encoding called “utf8” (alias of “utf8mb3”) only stores a maximum of three bytes per code point. So the character set “utf8″/”utf8mb3” cannot store all Unicode code points: it only supports the … Read more