String concatenation containing Arabic and Western characters

You can embed bidi regions using unicode format control codepoints:

  • Left-to-right embedding (U+202A)
  • Right-to-left embedding (U+202B)
  • Pop directional formatting (U+202C)

So in java, to embed a RTL language like Arabic in an LTR language like English, you would do

myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish

and to do the reverse

myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic

See Bidirectional General Formatting for more details, or the Unicode specification chapter on “Directional Formatting Codes” for the source material.

Leave a Comment