transliteration - w3toppers.com

Transliteration in ruby

Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command

Convert accented characters into ascii character

How to transliterate Cyrillic to Latin text

You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin. Example usage: Assert.AreEqual(“Rabota s kirillitsey”, “Работа с кириллицей”.Unidecode()); Assert.AreEqual(“CZSczs”, “ČŽŠčžš”.Unidecode()); Assert.AreEqual(“Hello, World!”, “Hello, World!”.Unidecode()); Testing Cyrillic: /// <summary> /// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN. /// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian /// With converting “ё” to “yo”. /// </summary> [TestMethod] public void RussianAlphabetTest() … Read more

Character Translation using Python (like the tr command)

See string.translate import string “abc”.translate(string.maketrans(“abc”, “def”)) # => “def” Note the doc’s comments about subtleties in the translation of unicode strings. And for Python 3, you can use directly: str.translate(str.maketrans(“abc”, “def”)) Edit: Since tr is a bit more advanced, also consider using re.sub.

How do you map-replace characters in Javascript similar to the ‘tr’ function in Perl?

There isn’t a built-in equivalent, but you can get close to one with replace: data = data.replace(/[\-_]/g, function (m) { return { ‘-‘: ‘+’, ‘_’: “https://stackoverflow.com/” }[m]; });

Python and character normalization

I recommend using Unidecode module: >>> from unidecode import unidecode >>> unidecode(u’ıöüç’) ‘iouc’ Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII.

PHP Transliteration

You can use iconv, which has a special transliteration encoding. When the string “//TRANSLIT” is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character. — http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html See here … Read more

Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

I have done this recently in Java: public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(“[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+”); private static String stripDiacritics(String str) { str = Normalizer.normalize(str, Normalizer.Form.NFD); str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll(“”); return str; } This will do as you specified: stripDiacritics(“Björn”) = Bjorn but it will fail on for example Białystok, because the ł character is not diacritic. If … Read more