Python the same char not equals

Here unicodedata.normalize might help you.

Basically if you normalize the data coming from the db, and normalize your selection to the same form, you should have a better result when using str.find, str.__contains__ (i.e. in), str.index, and friends.

>>> u1 = chr(281)
>>> u2 = chr(101) + chr(808)
>>> print(u1, u2)
ę ę
>>> u1 == u2
False
>>> unicodedata.normalize('NFC', u2) == u1
True

NFC stands for the Normal Form Composed form. You can read up here for some description of the other possible forms.

Leave a Comment