Using char16_t and char32_t in I/O

In the proposal Minimal Unicode support for the standard library (revision 2) it is indicated that there was only support among the Library Working Group for supporting the new character types in strings and codecvt facets. Apparently the majority was opposed to supporing iostream, fstream, facets other than codecvt, and regex.

According to minutes from the Portland meeting in 2006 “the LWG is committed to full support of Unicode, but does not intend to duplicate the library with Unicode character variants of existing library facilities.” I haven’t found any details, however I would guess that the committee feels that the current library interface is inappropriate for Unicode. One possible complaint could be that it was designed with fixed sized characters in mind, but Unicode completely obsoletes that as, while Unicode data can use fixed sized code points, it does not limit characters to single code points.

Personally I think there’s no reason not to standardized the minimal support that’s already provided on various platforms (Windows uses UTF-16 for wchar_t, most Unix platforms use UTF-32). More advanced Unicode support will require new library facilities, but supporting char16_t and char32_t in iostreams and facets won’t get in the way but would enable basic Unicode i/o.

Leave a Comment