PHP/PDO/MySQL: inserting into MEDIUMBLOB stores bad data

This seems like a bug to me: why should the charset of the connection have any effect on data for a binary column, particularly when it’s been identified as binary to PDO itself with PARAM_LOB?

I do not think that this must be a bug. I can imagine that whenever the client talks with the server and says that the following command is in UTF-8 and the server needs it in Latin-1, then the query might get re-encoded prior parsing and execution. So this is an encoding issue for the transportation of the data. As the whole query prior parsing will get influenced by this re-encoding, the binary data for the BLOB column will get changed as well.

From the Mysql manual:

What character set should the server translate a statement to after receiving it?

For this, the server uses the character_set_connection and collation_connection system variables. It converts statements sent by the client from character_set_client to character_set_connection (except for string literals that have an introducer such as _latin1 or _utf8). collation_connection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

Or on the way back: Latin1 data from the store will get converted into UTF-8 because the client told the server that it prefers UTF-8 for the transportation.

The identifier for PDO itself you name looks like being something entirely different:

PDO::PARAM_LOB tells PDO to map the data as a stream, so that you can manipulate it using the PHP Streams API. (Ref)

I’m no MySQL expert but I would explain it this way. Client and server need to negotiate which charsets they are using and I assume they do this for a reason.

Leave a Comment