Force XDocument to write to String with UTF-8 encoding

Try this: using System; using System.IO; using System.Text; using System.Xml.Linq; class Test { static void Main() { XDocument doc = XDocument.Load(“test.xml”, LoadOptions.PreserveWhitespace); doc.Declaration = new XDeclaration(“1.0”, “utf-8”, null); StringWriter writer = new Utf8StringWriter(); doc.Save(writer, SaveOptions.None); Console.WriteLine(writer); } private class Utf8StringWriter : StringWriter { public override Encoding Encoding { get { return Encoding.UTF8; } } } … Read more

Post UTF-8 encoded data to server loses certain characters

After much research and attempts to make things working, I finally found a solution for the problem, that is a simple addition to existing code. Solution was to use parameter “UTF-8” in the UrlEncodedFormEntity class constructor: form = new UrlEncodedFormEntity(nameValuePairs,”UTF-8″); After this change, characters were encoded and delivered properly to the server side.

Convert UTF-8 to base64 string

It’s a little difficult to tell what you’re trying to achieve, but assuming you’re trying to get a Base64 string that when decoded is abcdef==, the following should work: byte[] bytes = Encoding.UTF8.GetBytes(“abcdef==”); string base64 = Convert.ToBase64String(bytes); Console.WriteLine(base64); This will output: YWJjZGVmPT0= which is abcdef== encoded in Base64. Edit: To decode a Base64 string, simply … Read more

SQLite, python, unicode, and non-utf data

I’m still ignorant of whether there is a way to correctly convert ‘ó’ from latin-1 to utf-8 and not mangle it repr() and unicodedata.name() are your friends when it comes to debugging such problems: >>> oacute_latin1 = “\xF3” >>> oacute_unicode = oacute_latin1.decode(‘latin1’) >>> oacute_utf8 = oacute_unicode.encode(‘utf8’) >>> print repr(oacute_latin1) ‘\xf3′ >>> print repr(oacute_unicode) u’\xf3’ >>> … Read more

Using Swift to unescape unicode characters, ie \u1234

It’s fairly similar in Swift, though you still need to use the Foundation string classes: let transform = “Any-Hex/Java” let input = “\\u5404\\u500b\\u90fd” as NSString var convertedString = input.mutableCopy() as NSMutableString CFStringTransform(convertedString, nil, transform as NSString, 1) println(“convertedString: \(convertedString)”) // convertedString: 各個都 (The last parameter threw me for a loop until I realized that Boolean … Read more

Replacing invalid UTF-8 characters by question marks, mbstring.substitute_character seems ignored

You can use mb_convert_encoding() or htmlspecialchars()‘s ENT_SUBSTITUTE option since PHP 5.4. Of cource you can use preg_match() too. If you use intl, you can use UConverter since PHP 5.5. Recommended substitute character for invalid byte sequence is U+FFFD. see “3.1.2 Substituting for Ill-Formed Subsequences” in UTR #36: Unicode Security Considerations for the details. When using … Read more