How should I use g++’s -finput-charset compiler option correctly in order to compile a non-UTF-8 source file?

Encoding Blues You cannot use UTF-16 for source code files; because the header you are including, <iostream>, is not UTF-16-encoded. As #include includes the files verbatim, this means that you suddenly have an UTF-16-encoded file with a large chunk (approximately 4k, apparently) of invalid data. There is almost no good reason to ever use UTF-16 … Read more

‘git log’ output encoding issues in Windows 10 CLI terminal

Okay, I experimented a bit and found out that Windows Git commands actually need UNIX variables like LC_ALL in order to display Polish (or other UTF-8 characters) correctly. Just try this command: set LC_ALL=C.UTF-8 Then enjoy the result. Here is what happened on my console (font “Consolas”, no chcp necessary): Update: Well, in order for … Read more

MySQL: Get character-set of database or table or column?

Here’s how I’d do it – For Schemas (or Databases – they are synonyms): SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = “mydatabasename”; For Tables: SELECT CCSA.character_set_name FROM information_schema.`TABLES` T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = “mydatabasename” AND T.table_name = “tablename”; For Columns: SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_schema = “mydatabasename” AND table_name … Read more

Decoding numeric html entities via PHP

html_entity_decode already does what you’re looking for: $string = ‘&#146;’; echo html_entity_decode($string, ENT_COMPAT, ‘UTF-8′); It will return the character: ’ binary hex: c292 Which is PRIVATE USE TWO (U+0092). As it’s private use, your PHP configuration/version/compile might not return it at all. Also there are some more quirks: But in HTML (other than XHTML, which … Read more

Detect the URI encoding automatically in Tomcat

The complicated way to achieve my goal was indeed to write my own javax.servlet.Filter and to embed it into the filter chain. This solution complies with the Apache Tomcat suggestion provided in Tomcat Wiki – Character Encoding Issues. Update (2010-07-31): The first version of this filter interpreted the query string itself, which was a bad … Read more

UTF-8 not working in HTML forms

In your HTML, add this meta tag: <meta http-equiv=”Content-Type” content=”text/html;charset=UTF-8″> Also add this PHP header at top of the script: header(“Content-Type: text/html;charset=UTF-8”); [EDIT]: One more tip is to save the file as UTF-8 without BOM encoding. You can use Notepad++ or any decent editor to do that.