Go (golang.org) has pretty good Unicode support on the Windows command line. I've written about Unicode and cmd.exe before in the context of C++, C# and Java.
Versions: Windows 7 (64-bit); go1.1.1 windows/amd64; Java 1.7.0_21
Miscellaneous Computer Code
Go (golang.org) has pretty good Unicode support on the Windows command line. I've written about Unicode and cmd.exe before in the context of C++, C# and Java.
Versions: Windows 7 (64-bit); go1.1.1 windows/amd64; Java 1.7.0_21
JSON documents are generally encoded using UTF-8 but the format also supports four other encoding forms. This post covers the mechanics of character encoding detection for JSON parsers that don't provide handling for them - for example, Gson and JSON.simple.
EDIT: 2014; a version of this library has been published to Maven central.
This is my attempt at a list of maxims to abide by when working with text in Java, in the vein of Effective Java or The Ten Commandments of Unicode. It is also a summary of another post on character encoding. The list is in no way comprehensive.
It can be tricky figuring out the difference between character handling code that works and code that just appears to work because testing did not encounter cases that exposed bugs. This is a post about some of the pitfalls of character handling in Java.
By default, Java encodes Strings sent to System.out
in the default code page. On Windows XP, this means a lossy conversion
to an "ANSI" code page. This is unfortunate, because the Windows Command
Prompt (cmd.exe
) can read and write Unicode characters. This post describes
how to use JNA to work round
this problem.
This post is a follow-up to I18N: Unicode at the Windows command prompt (C++; .Net; Java), so you might want to read that first.
Strange things can happen when working with characters. It is
important to understand why problems occur and what can be done about
them. This post is about getting Unicode to work at the Windows command
prompt (cmd.exe
).