How to detect whether a character belongs to a Right To Left language?

Unicode characters have different properties associated with them. These properties cannot be derived from the code point; you need a table that tells you if a character has a certain property or not.

You are interested in characters with bidirectional property “R” or “AL” (RandALCat).

A RandALCat character is a character with unambiguously right-to-left directionality.

Here’s the complete list as of Unicode 3.2 (from RFC 3454):

D. Bidirectional tables

D.1 Characters with bidirectional property "R" or "AL"

----- Start Table D.1 -----
05BE
05C0
05C3
05D0-05EA
05F0-05F4
061B
061F
0621-063A
0640-064A
066D-066F
0671-06D5
06DD
06E5-06E6
06FA-06FE
0700-070D
0710
0712-072C
0780-07A5
07B1
200F
FB1D
FB1F-FB28
FB2A-FB36
FB38-FB3C
FB3E
FB40-FB41
FB43-FB44
FB46-FBB1
FBD3-FD3D
FD50-FD8F
FD92-FDC7
FDF0-FDFC
FE70-FE74
FE76-FEFC
----- End Table D.1 -----

Here’s some code to get the complete list as of Unicode 6.0:

var url = "http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt";

var query = from record in new WebClient().DownloadString(url).Split('\n')
            where !string.IsNullOrEmpty(record)
            let properties = record.Split(';')
            where properties[4] == "R" || properties[4] == "AL"
            select int.Parse(properties[0], NumberStyles.AllowHexSpecifier);

foreach (var codepoint in query)
{
    Console.WriteLine(codepoint.ToString("X4"));
}

Note that these values are Unicode code points. Strings in C#/.NET are UTF-16 encoded and need to be converted to Unicode code points first (see Char.ConvertToUtf32). Here’s a method that checks if a string contains at least one RandALCat character:

static void IsAnyCharacterRightToLeft(string s)
{
    for (var i = 0; i < s.Length; i += char.IsSurrogatePair(s, i) ? 2 : 1)
    {
        var codepoint = char.ConvertToUtf32(s, i);
        if (IsRandALCat(codepoint))
        {
            return true;
        }
    }
    return false;
}

Leave a Comment