- Use ICU for dealing with your data (or a similar library)
- In your own data store, make sure everything is stored in the same encoding
- Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like
is_alpha
unless that is the definition you want. - I can’t say it enough: never iterate over the indices of a
string
if you care about correctness, always use your unicode library for this.