The problem here is that you’ve stored a UTF-8
string to a different encoding in your database – probably the Windows-1252
code page (CP2152
). As a result the UTF-8
character ’
represented by the byte sequence E2 80 99
is translated into the CP2152
single-byte characters ’
. This was all explained to your previously in this answer, which also gives a solution to your current problem.
In order to get back to the original UTF-8
encoding you will need to take the string returned from your database and correct it with the following code:
public static string UTF8From1252(string source)
{
// get original UTF-8 bytes from CP1252-encoded string
byte[] bytes = System.Text.Encoding.GetEncoding("windows-1252").GetBytes(source);
return System.Text.Encoding.UTF8.GetString(bytes);
}
This highlights the fact that it is vital to use the correct encoding at all times when using the GetBytes
method.
It is important to note that the reverse of this transformation is not always possible, since there are gaps in the CP2152
code space – values that will be discarded or altered during conversion from byte values.
The hex values for these gaps are: 81 8D 8F 90 9D
.
Unfortunately these values are present in various UTF-8
encodings, such as ”
(E2 80 9D
). If you have one of these values in your database then it will not load correctly. Depending on how you did the first stage conversion the third byte may be lost or corrupted in the database, in which case you cannot retrieve it.