Handling special characters [closed]

The problem here is that you’ve stored a UTF-8 string to a different encoding in your database – probably the Windows-1252 code page (CP2152). As a result the UTF-8 character represented by the byte sequence E2 80 99 is translated into the CP2152 single-byte characters ’. This was all explained to your previously in this answer, which also gives a solution to your current problem.

In order to get back to the original UTF-8 encoding you will need to take the string returned from your database and correct it with the following code:

public static string UTF8From1252(string source)
{
    // get original UTF-8 bytes from CP1252-encoded string
    byte[] bytes = System.Text.Encoding.GetEncoding("windows-1252").GetBytes(source);
    return System.Text.Encoding.UTF8.GetString(bytes);
}

This highlights the fact that it is vital to use the correct encoding at all times when using the GetBytes method.

It is important to note that the reverse of this transformation is not always possible, since there are gaps in the CP2152 code space – values that will be discarded or altered during conversion from byte values.

The hex values for these gaps are: 81 8D 8F 90 9D.

Unfortunately these values are present in various UTF-8 encodings, such as (E2 80 9D). If you have one of these values in your database then it will not load correctly. Depending on how you did the first stage conversion the third byte may be lost or corrupted in the database, in which case you cannot retrieve it.

Browse More Popular Posts

Leave a Comment