What is the range of Unicode Printable Characters?

See, http://en.wikipedia.org/wiki/Unicode_control_characters You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F other than C-control character, Unicode also has hundreds of formatting … Read more

UTF-8 safe equivalent of ord or charCodeAt() in PHP

mbstring version: function utf8_char_code_at($str, $index) { $char = mb_substr($str, $index, 1, ‘UTF-8’); if (mb_check_encoding($char, ‘UTF-8’)) { $ret = mb_convert_encoding($char, ‘UTF-32BE’, ‘UTF-8’); return hexdec(bin2hex($ret)); } else { return null; } } using htmlspecialchars and htmlspecialchars_decode for getting one character: function utf8_char_code_at($str, $index) { $char=””; $str_index = 0; $str = utf8_scrub($str); $len = strlen($str); for ($i = … Read more

How to detect the right encoding for read.csv?

First of all based on more general question on StackOverflow it is not possible to detect encoding of file in 100% certainty. I’ve struggle this many times and come to non-automatic solution: Use iconvlist to get all possible encodings: codepages <- setNames(iconvlist(), iconvlist()) Then read data using each of them x <- lapply(codepages, function(enc) try(read.table(“encoding.asc”, … Read more

Detect file encoding in PHP

Try using the mb_detect_encoding function. This function will examine your string and attempt to “guess” what its encoding is. You can then convert it as desired. As brulak suggested, however, you’re probably better off converting to UTF-8 rather than from, to preserve the data you’re transmitting.