MySQL matching unicode characters with ascii version

You specified the name column as text CHARACTER SET utf8 COLLATE utf8_unicode_ci which tells MySQL to consider e and é as equivalent in matching and sorting. That collation and utf8_general_ci both make a lot of things equivalent.

http://www.collation-charts.org/ is a great resource once you learn how to read the charts, which is pretty easy.

If you want e and é etc. to be considered different then you must choose a different collation. To find out what collations are on your server (assuming you’re limited to UTF-8 encoding):

mysql> show collation like 'utf8%';

And choose using the collation charts as a reference.

One more special collation is utf8_bin in which there are no equivalencies, it’s a binary match.

The only MySQL Unicode collations I’m aware of that are not language specific are utf8_unicode_ci, utf8_general_ci and utf8_bin. They are rather weird. The real purpose of a collation is to make the computer match and sort as a person from somewhere would expect. Hungarian and Turkish dictionaries have their entries ordered according to different rules. Specifying a collation allows you to sort and match according to such local rules.

For example, it seems Danes consider e and é equivalent but Icelanders don’t:

mysql> select _utf8'e' collate utf8_danish_ci
    -> = _utf8'é' collate utf8_danish_ci as equal;
+-------+
| equal |
+-------+
|     1 |
+-------+

mysql> select _utf8'e' collate utf8_icelandic_ci
    -> = _utf8'é' collate utf8_icelandic_ci as equal;
+-------+
| equal |
+-------+
|     0 |
+-------+

Another handy trick is to fill a one column table with a bunch of characters you’re interested in (it’s easier from a script) and then MySQL can tell you the equivalencies:

mysql> create table t (c char(1) character set utf8);
mysql> insert into t values ('a'), ('ä'), ('á');
mysql> select group_concat(c) from t group by c collate utf8_icelandic_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a               |
| á               |
| ä               |
+-----------------+

mysql> select group_concat(c) from t group by c collate utf8_danish_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a,á             |
| ä               |
+-----------------+

mysql> select group_concat(c) from t group by c collate utf8_general_ci;
+-----------------+
| group_concat(c) |
+-----------------+
| a,ä,á           |
+-----------------+

Leave a Comment