Transliteration in ruby
Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command
Ruby has an Iconv library in its stdlib which converts encodings in a very similar way to the usual iconv command
You can use .NET open source dll library UnidecodeSharpFork to transliterate Cyrillic and many more languages to Latin. Example usage: Assert.AreEqual(“Rabota s kirillitsey”, “Работа с кириллицей”.Unidecode()); Assert.AreEqual(“CZSczs”, “ČŽŠčžš”.Unidecode()); Assert.AreEqual(“Hello, World!”, “Hello, World!”.Unidecode()); Testing Cyrillic: /// <summary> /// According to http://en.wikipedia.org/wiki/Romanization_of_Russian BGN/PCGN. /// http://en.wikipedia.org/wiki/BGN/PCGN_romanization_of_Russian /// With converting “ё” to “yo”. /// </summary> [TestMethod] public void RussianAlphabetTest() … Read more
See string.translate import string “abc”.translate(string.maketrans(“abc”, “def”)) # => “def” Note the doc’s comments about subtleties in the translation of unicode strings. And for Python 3, you can use directly: str.translate(str.maketrans(“abc”, “def”)) Edit: Since tr is a bit more advanced, also consider using re.sub.
There isn’t a built-in equivalent, but you can get close to one with replace: data = data.replace(/[\-_]/g, function (m) { return { ‘-‘: ‘+’, ‘_’: “https://stackoverflow.com/” }[m]; });
I recommend using Unidecode module: >>> from unidecode import unidecode >>> unidecode(u’ıöüç’) ‘iouc’ Note how you feed it a unicode string and it outputs a byte string. The output is guaranteed to be ASCII.
You can use iconv, which has a special transliteration encoding. When the string “//TRANSLIT” is appended to tocode, transliteration is activated. This means that when a character cannot be represented in the target character set, it can be approximated through one or several characters that look similar to the original character. — http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html See here … Read more
I have done this recently in Java: public static final Pattern DIACRITICS_AND_FRIENDS = Pattern.compile(“[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+”); private static String stripDiacritics(String str) { str = Normalizer.normalize(str, Normalizer.Form.NFD); str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll(“”); return str; } This will do as you specified: stripDiacritics(“Björn”) = Bjorn but it will fail on for example Białystok, because the ł character is not diacritic. If … Read more