What is the range of Unicode Printable Characters?

See, http://en.wikipedia.org/wiki/Unicode_control_characters You might want to look especially at C0 and C1 control character http://en.wikipedia.org/wiki/C0_and_C1_control_codes The wiki says, the C0 control character is in the range U+0000—U+001F and U+007F (which is the same range as ASCII) and C1 control character is in the range U+0080—U+009F other than C-control character, Unicode also has hundreds of formatting … Read more

python 3.0, how to make print() output unicode?

The Windows command prompt (cmd.exe) cannot display the Unicode characters you are using, even though Python is handling it in a correct manner internally. You need to use IDLE, Cygwin, or another program that can display Unicode correctly. See this thread for a full explanation: http://www.nabble.com/unable-to-print-Unicode-characters-in-Python-3-td21670662.html

Printing Unicode characters to the PowerShell prompt

This is not a PowerShell deficiency. It is a deficiency with the Windows console subsystem which PowerShell.exe uses. The console subsystem does not support Unicode but code pages instead which dates back to the DOS days. The PowerShell V2 fix is provided via the PowerShell Integrated Scripting Environment or PowerShell_ISE.exe. This is a graphical app … Read more

List of all unicode’s open/close brackets?

There is a plain-text database of information about every Unicode character available from the Unicode Consortium; the format is described in Unicode Annex #44. The primary information is contained in UnicodeData.txt. Open and close punctuation characters are denoted with Ps (punctuation start) and Pe (punctuation end) in the General_Category field (the third field, delimited by … Read more

Can UTF-8 contain zero byte?

Yes, the zero byte in UTF8 is code point 0, NUL. There is no other Unicode code point that will be encoded in UTF8 with a zero byte anywhere within it. The possible code points and their UTF8 encoding are: Range Encoding Binary value —————– ——– ————————– U+000000-U+00007f 0xxxxxxx 0xxxxxxx U+000080-U+0007ff 110yyyxx 00000yyy xxxxxxxx 10xxxxxx … Read more