Is a BLOB converted using the current/default charset in MySQL?

Short Answer:

Simply delete or comment out line below, and it will always work, no matter which database encoding is really in use (utf8, latin1, etc):

$pdo->exec('SET CHARACTER SET utf8');

Long Answer:

This is not PDO bug, this is MySQL bug.

When actual database encoding is latin1, but you use:

SET CHARACTER SET utf8

(or vice versa: actual is utf8, but you use latin1 – important part is that it is different), then, as far as I can tell, MySQL will try to perform charset conversion for all traffic between client and server (even for BLOB!).

If you DO NOT use SET CHARACTER SET statement, from what I see for scripts (PHP/PDO or Perl/DBI) connection charset by default is set to be the database charset, and in that case no implicit conversion takes place.

Obviously, this automatic conversion is what kills BLOBs, which do not want any conversion to happen.

I have tested this on both PHP/PDO and Perl/DBI, and issue is easily reproducible: both will fail if using database with latin1 encoding and using SET CHARACTER SET utf8 (or vice versa).

If you want to be fully UTF8 compatible, you should change encoding of your database using:

ALTER DATABASE mydb CHARSET utf8;

With this, everything will be using UTF8, and BLOBs will also work fine.

Minimal file that causes this corruption problem is blob.bin with single byte 0xFF. On Linux, you can create this test file using printf command:

printf "0xFF" > blob.bin

Now, test scripts that reproduce the problem:

PHP test code:

<?php
$dbh = new PDO("mysql:host=127.0.0.1;dbname=test");
# If database encoding is NOT utf8, uncomment to break it:
# $dbh->exec("SET CHARACTER SET utf8");

$blob1 = file_get_contents("blob.bin");
$sth = $dbh->prepare(
    "INSERT INTO pdo_blob (the_blob) VALUES(:the_blob)"
);
$sth->bindParam(":the_blob", $blob1, PDO::PARAM_LOB);
$sth->execute();

$sth = $dbh->prepare(
    "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
);
$sth->execute();

$blob2 = null;
$sth->bindColumn(1, $blob2, PDO::PARAM_LOB);
$sth->fetch();

if ($blob1 == $blob2) {
    echo "Equal\n";
} else {
    echo "Not equal\n";
    $arr1 = str_split($blob1);
    $arr2 = str_split($blob2);
    $i=0;
    for ($i=0; $i<count($arr1); $i++) {
        if ($arr1[$i] != $arr2[$i]) {
            echo "First diff: " . dechex(ord($arr1[$i])) . " != "
                                . dechex(ord($arr2[$i])) . "\n";
            break;
        }
    }
}
?>

Perl test code:

#!/usr/bin/perl -w

use strict;
use DBI qw(:sql_types);

my $dbh = DBI->connect("dbi:mysql:host=127.0.0.1;dbname=test");
# If database encoding is NOT utf8, uncomment to break it:
# $dbh->do("SET CHARACTER SET utf8");
open FILE, "blob.bin";
binmode FILE;
read(FILE, my $blob1, 100000000);
close FILE;
my $sth = $dbh->prepare(
    "INSERT INTO pdo_blob (the_blob) VALUES(?)"
);
$sth->bind_param(1, $blob1, SQL_BLOB);
$sth->execute();
my ($blob2) = $dbh->selectrow_array(
    "SELECT the_blob FROM pdo_blob ORDER BY id DESC LIMIT 1"
);
print ($blob1 eq $blob2 ? "Equal" : "Not equal") , "\n";

Leave a Comment