utf-8 - w3toppers.com

Force XDocument to write to String with UTF-8 encoding

Try this: using System; using System.IO; using System.Text; using System.Xml.Linq; class Test { static void Main() { XDocument doc = XDocument.Load(“test.xml”, LoadOptions.PreserveWhitespace); doc.Declaration = new XDeclaration(“1.0”, “utf-8”, null); StringWriter writer = new Utf8StringWriter(); doc.Save(writer, SaveOptions.None); Console.WriteLine(writer); } private class Utf8StringWriter : StringWriter { public override Encoding Encoding { get { return Encoding.UTF8; } } } … Read more

Post UTF-8 encoded data to server loses certain characters

After much research and attempts to make things working, I finally found a solution for the problem, that is a simple addition to existing code. Solution was to use parameter “UTF-8” in the UrlEncodedFormEntity class constructor: form = new UrlEncodedFormEntity(nameValuePairs,”UTF-8″); After this change, characters were encoded and delivered properly to the server side.

Detecting utf8 broken characters in MySQL

I fixed with UPDATE wp_zcs9ck_posts_copy SET post_title = CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8); Complete solution: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/

Convert UTF-8 to base64 string

It’s a little difficult to tell what you’re trying to achieve, but assuming you’re trying to get a Base64 string that when decoded is abcdef==, the following should work: byte[] bytes = Encoding.UTF8.GetBytes(“abcdef==”); string base64 = Convert.ToBase64String(bytes); Console.WriteLine(base64); This will output: YWJjZGVmPT0= which is abcdef== encoded in Base64. Edit: To decode a Base64 string, simply … Read more

UTF-8 & Unicode, what’s with 0xC0 and 0x80?

It’s not a comparison with 0xc0, it’s a logical AND operation with 0xc0. The bit mask 0xc0 is 11 00 00 00 so what the AND is doing is extracting only the top two bits: ab cd ef gh AND 11 00 00 00 — — — — = ab 00 00 00 This is … Read more

SQLite, python, unicode, and non-utf data

I’m still ignorant of whether there is a way to correctly convert ‘ó’ from latin-1 to utf-8 and not mangle it repr() and unicodedata.name() are your friends when it comes to debugging such problems: >>> oacute_latin1 = “\xF3” >>> oacute_unicode = oacute_latin1.decode(‘latin1’) >>> oacute_utf8 = oacute_unicode.encode(‘utf8’) >>> print repr(oacute_latin1) ‘\xf3′ >>> print repr(oacute_unicode) u’\xf3’ >>> … Read more

ASCII vs Unicode + UTF-8

In modern times, ASCII is now a subset of UTF-8, not its own scheme. UTF-8 is backwards compatible with ASCII.

Convert std::string to QString

QString::fromStdString(content) is better since it is more robust. Also note, that if std::string is encoded in UTF-8, then it should give exactly the same result as QString::fromUtf8(content.data(), int(content.size())).

Using Swift to unescape unicode characters, ie \u1234

It’s fairly similar in Swift, though you still need to use the Foundation string classes: let transform = “Any-Hex/Java” let input = “\\u5404\\u500b\\u90fd” as NSString var convertedString = input.mutableCopy() as NSMutableString CFStringTransform(convertedString, nil, transform as NSString, 1) println(“convertedString: \(convertedString)”) // convertedString: 各個都 (The last parameter threw me for a loop until I realized that Boolean … Read more

Replacing invalid UTF-8 characters by question marks, mbstring.substitute_character seems ignored

You can use mb_convert_encoding() or htmlspecialchars()‘s ENT_SUBSTITUTE option since PHP 5.4. Of cource you can use preg_match() too. If you use intl, you can use UConverter since PHP 5.5. Recommended substitute character for invalid byte sequence is U+FFFD. see “3.1.2 Substituting for Ill-Formed Subsequences” in UTR #36: Unicode Security Considerations for the details. When using … Read more