Unicode Processing in C++

  • Use ICU for dealing with your data (or a similar library)
  • In your own data store, make sure everything is stored in the same encoding
  • Make sure you are always using your unicode library for mundane tasks like string length, capitalization status, etc. Never use standard library builtins like is_alpha unless that is the definition you want.
  • I can’t say it enough: never iterate over the indices of a string if you care about correctness, always use your unicode library for this.

Leave a Comment