Special characters in Visual Studio 2019 C++ project AND executing CMD commands with them

I’ve provided you a very good answer in your other question. Here is something similar.

Your program can use UTF-8 encoding and console can use different encoding, but you have to give a hint to standard library how are encoded each data sources.
Of course if destination encoding do not cover do not support specific character some fallback have to kick in (see example at bottom).

There are 4 areas where encoding must be well defined to make everything work:

  • Your source code. VS used system locale to use encoding and this is bad. Force VS and all editors to use encoding which is universal, UTF-8 choice is best. It is best to inform compiler how source is encoded: cl /source-charset:utf-8 .....
  • Your executable. You have to define what kind of encoding string literals should be encode in final executable. Here UTF-8 is also the best: cl .... /execution-charset:utf-8 .....
  • When you run application you have to inform standard library what kind of encoding your string literals are define in or what encodings in program logic is used. So somewhere in your code at beginning of execution you need something like this:
std::locale::global(std::locale{".utf-8"});
  • and finally you have to instruct stream what kind of encoding it should use. So for std::cout and std::cin you should set locale which is default for the system:
    auto streamLocale = std::locale{""}; 
    // this impacts date/time/floating point formats, so you may want tweak it just to use sepecyfic encoding and use C-loclae for formating
    std::cout.imbue(streamLocale);
    std::cin.imbue(streamLocale);

After this everything should work as desired without code which explicitly does conversions.
Since there are 4 places to make mistake, this is reason people have trouble with it and internet is full of “hack” solutions.

Here is some test program to prove my point:

#include <iostream>
#include <locale>
#include <exception>
#include <string>

void setupLocale(int argc, const char *argv[])
{
    std::locale def{""};
    std::locale::global(argc > 1 ? std::locale{argv[1]} : def);
    auto streamLocale = argc > 2 ? std::locale{argv[2]} : def;
    std::cout.imbue(streamLocale);
    std::cin.imbue(streamLocale);
}

void printSeparator()
{
    std::cout << "---------\n";
}

void printTestStuff()
{
    std::cout << "Wester Europe: āāāčččēēēēßÞÖöñÅÃ\n";
    std::cout << "Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă\n";
    std::cout << "China: 字集碼是把字符集中的字符编码为指定集合中某一对象\n";
    std::cout << "Korean: 줄여서 인코딩은 사용자가 입력한\n";
}

int main(int argc, const char *argv[]) {
    try{
        setupLocale(argc, argv);
        printSeparator();
        printTestStuff();
        printSeparator();
    }
    catch(const std::exception& e)
    {
        std::cerr << e.what() << '\n';
    }
}

And how it was build and run to show that it works (note this also covers scenarios when invalid encoding is used):

C:\Users\User\Downloads>cl /source-charset:utf-8 /execution-charset:utf-8 /EHsc encodings.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.28.29336 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

encodings.cpp
Microsoft (R) Incremental Linker Version 14.28.29336.0
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:encodings.exe
encodings.obj

C:\Users\User\Downloads>chcp
Active code page: 437

C:\Users\User\Downloads>encodings.exe
---------
Wester Europe: Ä?Ä?Ä?Ä?Ä?Ä?Ä"Ä"Ä"Ä"AYAzA-AA±A.Aƒ
Central Europe: Ä.Ä,A"A3Å?Å,Ä~ÄTżÄ╪źŰűA?A½Ä,ă
China: å--é>+碼æ~_æSSå--ç¬▌é>+ä,-çs,å--ç¬▌ç¼-ç ?ä,ºæO╪årsé>+å?^ä,-æY?ä,?å_1象
Korean: ì,ì-¬ì,o ì?,ì½"ë"cì?? ì,¬ìscìz?ê°? ìz.ë ¥ío
---------

C:\Users\User\Downloads>encodings.exe .65001
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>encodings.exe .65001 .437
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>encodings.exe .65001 .1250
---------
Wester Europe: aaaccceeeeß_ÖöñÅA
Central Europe: aAOóLlEezczUuYyAa
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>chcp 1250
Active code page: 1250

C:\Users\User\Downloads>encodings.exe .65001 .1250
---------
Wester Europe: aaačččeeeeß?ÖönAA
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: ????????????????????????
Korean: ??? ???? ???? ???
---------

C:\Users\User\Downloads>chcp 65001
Active code page: 65001

C:\Users\User\Downloads>encodings.exe
---------
Wester Europe: ÄÄÄÄÄÄēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóÅłĘężćźŰűÃýĂă
China: 字集碼是把字符集中的字符编ç ä¸ºæŒ‡å®šé›†åˆä¸­æŸä¸€å¯¹è±¡
Korean: 줄여서 ì¸ì½”ë”©ì€ ì‚¬ìš©ìžê°€ 입력한
---------

C:\Users\User\Downloads>encodings.exe .65001
---------
Wester Europe: āāāčččēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: 字集碼是把字符集中的字符编码为指定集合中某一对象
Korean: 줄여서 인코딩은 사용자가 입력한
---------

C:\Users\User\Downloads>encodings.exe .65001 .65001
---------
Wester Europe: āāāčččēēēēßÞÖöñÅÃ
Central Europe: ąĄÓóŁłĘężćźŰűÝýĂă
China: 字集碼是把字符集中的字符编码为指定集合中某一对象
Korean: 줄여서 인코딩은 사용자가 입력한
---------

C:\Users\User\Downloads>

Leave a Comment