Convert ISO-8859-1 strings to UTF-8 in C/C++

If your source encoding will always be ISO-8859-1, this is trivial. Here’s a loop:

unsigned char *in, *out;
while (*in)
    if (*in<128) *out++=*in++;
    else *out++=0xc2+(*in>0xbf), *out++=(*in++&0x3f)+0x80;

For safety you need to ensure that the output buffer is twice as large as the input buffer, or else include a size limit and check it in the loop condition.

Leave a Comment