Passing command line unicode argument to Java code

Unfortunately you cannot reliably use non-ASCII characters with command-line apps that use the Windows C runtime’s stdlib, like Java (and pretty much all non-Windows-specific scripting languages really).

This is because they read their input and output using a locale-specific code page by default, which is never a UTF, unlike every other modern OS which uses UTF-8.

Whilst you can change the code page of a terminal to something else using the chcp command, the support for the UTF-8 encoding under chcp 65001 is broken in a few ways that are likely to trip apps up fatally.

If you only need Japanese you could switch to code page 932 (similar to Shift-JIS) by setting your locale (‘language for non-Unicode applications’ in the Regional settings) to Japan. This will still fail for characters that aren’t in that code page though.

If you need to get non-ASCII characters through the command line reliably on Windows, you need to call the Win32 API function GetCommandLineW directly to avoid the encode-to-system-code-page layer. Probably you’d want to do that using JNA.

Leave a Comment