UTF-8 Continuation bytes

A continuation byte in UTF-8 is any byte where the top two bits are 10. They are the subsequent bytes in multi-byte sequences. The following table may help: Unicode code points Encoding Binary value ——————- ——– ———— U+000000-U+00007f 0xxxxxxx 0xxxxxxx U+000080-U+0007ff 110yyyxx 00000yyy xxxxxxxx 10xxxxxx U+000800-U+00ffff 1110yyyy yyyyyyyy xxxxxxxx 10yyyyxx 10xxxxxx U+010000-U+10ffff 11110zzz 000zzzzz yyyyyyyy … Read more