How do I compare a Unicode string that has different bytes, but the same value?

Unicode normalization will get you there for this one:

>>> import unicodedata
>>> unicodedata.normalize("NFC", "\uf9fb") == "\u7099"
True

Use unicodedata.normalize on both of your strings before comparing them with == to check for canonical Unicode equivalence.

Character U+F9FB is a “CJK Compatibility” character. These characters decompose into one or more regular CJK characters when normalized.

Leave a Comment