How to transliterate Cyrillic to Latin text

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin. Example usage: Assert.AreEqual(“Rabota s kirillitsey”, “Работа с кириллицей”.Unidecode()); Assert.AreEqual(“CZSczs”, “ČŽŠčžš”.Unidecode()); Assert.AreEqual(“Hello, World!”, “Hello, World!”.Unidecode()); Testing Cyrillic: /// <summary> /// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN. /// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian /// With converting “ё” to “yo”. /// </summary> [TestMethod] public void RussianAlphabetTest() … Read more

PHP Transliteration

You can use iconv, which has a special transliteration encoding. When the string “//TRANSLIT” is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character. — http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html See here … Read more

Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

I have done this recently in Java: public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(“[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+”); private static String stripDiacritics(String str) { str = Normalizer.normalize(str, Normalizer.Form.NFD); str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll(“”); return str; } This will do as you specified: stripDiacritics(“Björn”) = Bjorn but it will fail on for example Białystok, because the ł character is not diacritic. If … Read more