How to output unicode string to RTF (using C#)

Provided that all the characters that you’re catering for exist in the Basic Multilingual Plane (it’s unlikely that you’ll need anything more), then a simple UTF-16 encoding should suffice.

Wikipedia:

All possible code points from U+0000
through U+10FFFF, except for the
surrogate code points U+D800–U+DFFF
(which are not characters), are
uniquely mapped by UTF-16 regardless
of the code point’s current or future
character assignment or use.

The following sample program illustrates doing something along the lines of what you want:

static void Main(string[] args)
{
    // ë
    char[] ca = Encoding.Unicode.GetChars(new byte[] { 0xeb, 0x00 });
    var sw = new StreamWriter(@"c:/helloworld.rtf");
    sw.WriteLine(@"{\rtf
{\fonttbl {\f0 Times New Roman;}}
\f0\fs60 H" + GetRtfUnicodeEscapedString(new String(ca)) + @"llo, World!
}"); 
    sw.Close();
}

static string GetRtfUnicodeEscapedString(string s)
{
    var sb = new StringBuilder();
    foreach (var c in s)
    {
        if (c <= 0x7f)
            sb.Append(c);
        else
            sb.Append("\\u" + Convert.ToUInt32(c) + "?");
    }
    return sb.ToString();
}

The important bit is the Convert.ToUInt32(c) which essentially returns the code point value for the character in question. The RTF escape for unicode requires a decimal unicode value. The System.Text.Encoding.Unicode encoding corresponds to UTF-16 as per the MSDN documentation.

Leave a Comment