Php & Sql Injection – UTF8 POC

Update 2:

After further research, MySQL versions prior to 5.0.77 may be vulnerable to the GBK issue when combined with SET NAMES alone. It was earlier believed that only 5.0.22 and earlier were vulnerable.

This means that if you are using PHP versions prior to 5.2, in which mysql_set_charset / mysqli_set_charset were introduced, your code may be vulnerable under specific, well-crafted conditions.

If you’re stuck on PHP 5.1, please ensure that you are using MySQL 5.0.77 or later. 5.0.77 is “only” two years old, but has been pushed into the repositories for RHEL/CentOS 5.x, the more popular distribution stuck with the 5.0.x series of MySQL and 5.1.x series of PHP.

Get upgrading, people!


Update 1: Another recent question has uncovered the source of the GBK thing: A bugfix in MySQL 5.0.22. Versions earlier than this are severely vulnerable when using anything other than mysql_real_escape_string combined with mysql_set_charset instead of just SET NAMES. The mysqli equivilent is named mysqli_set_charset.

There does not appear to be an equivilent of mysql_set_charset in PDO. This may be either because it can use MySQL native prepared statements, which may be immune from the problem, or whether SET NAMES is enough for their underlying escaping mechanism to work as expected.

Regardless, if you’re using any MySQL version prior to 5.0.22 5.0.77 and are not taking extreme care to ensure that you’re only passing in strings in a known character set, you may find yourself open to attack.

I’m leaving the rest of my original post unmodified, but I have updated the tldr.


There is a lot of talk about how addslashes and mysql_real_escape function are not safe to prevent injections

This is half correct. addslashes is entirely the wrong thing to use to protect against SQL injection because it is not guaranteed to provide the right escaping method for all databases, mainly because it adds backslashes and sometimes the escaping mechanism is entirely different.

If you’re stuck in the ghetto of the prehistoric lump of crap known as the “mysql” extension (instead of using PDO or mysqli), mysql_real_escape_string is some of the best protection you’ve got when you need to concatenate together some SQL.

I know there are some particular scenarios when using GBK charset, or utf8_decode can be used to inject some sql code

You’re probably thinking of creating malformed UTF-8 sequences, however I’ve only ever seen this as an XSS mechanism, never an SQL injection mechanism. Running strings through iconv with //IGNORE//TRANSLIT should be good enough protection (usually by truncating the string at the point of the bad sequence, which is an acceptable failure mode when you’re being attacked — malformed sequences should never happen in legitimate requests).

Further, while there are plenty of “quote” characters in non-Latin languages, MySQL is pretty decent at only actually obeying the backtick and double quote for identifiers and the single quote for string values.

Thinking about it more, perhaps there’s some sequence of characters in another character set that might include a single quote in the middle, if taken as a different character set. However, it’s very, very likely that addslashes is entirely ignorant of character set, and just works on the raw bytes. It’d stick a backslash in the middle of a sequence, and blow it up. However, that should just result in a whine somewhere along the lines about bad character set information.

mysql_real_escape_string, on the other hand, is designed with knowledge of the connection’s character set built in, so it wouldn’t escape the sequence if it sees the sequence instead of a quote. However, because it would recognize it as a sequence instead of as a quote, there’s no danger at all.

Ultimately if you think this is a problem, it’s your responsibility to ensure that you accept input in only the expected character sets, and transform all input to your desired character set if there’s a mismatch. This will rarely if ever trip up a legitimate request.


tl;dr: Not a concern unless you’re using a really old MySQL version and/or aren’t making sure your data is in a known-good character set. Always use database-specific escape mechanisms for maximum safetey, and always assume the user is out to get you.

Leave a Comment