Synchronizing an SQLite client database with a MySQL server database

Wel you realize this is a non trivial problem. I wrote a library to accomplish this for a commercial app last year and it took about 6 months to get it to where I was happy with it.

Leaving aside the argument for using port 80 and HTTP (TCP/IP) to avoid firewall and support issues, you need to design a protocol. Since my project was very data intesive I went with a binary protocol (rather than the bloated xml) that could handle any data. I also wanted it to be bi-directional so that I could INSERT data as well as execute requests. I used CGI/FastCGI on the server.

The binary protocol i designed is quite simple (always better) and breaks large transfers into chunks of a user defined size (about 600k seems to be good). Each chunk has a header followed by the data.

Although this protocol can be used for transmitting any kind of data, typically it is used for database style data as your question suggests. To acomodate this, I decided to use a rows/colums approach to the design. The data is stored one row at a time meaning, each of the columns is stored for row one, then all the columns for row 2 … row n.

The format of a single columns data is:

' Col1Type          1Bytes - BYTE     ' Data Type (REMSQL_TEXT etc)                
' Col1Len           4Bytes - DWORD    ' Length in bytes the Column Data                            - up to 4.2GB
' Col1Data          nBytes - BYTE     ' String data  

(in C, a BYTE is CHAR)

This means that each column has a data type descriptor. All datatypes can be represented with:

REMSQL_NONE = 0    ' DataType undefined
REMSQL_QUAD = 1    ' 64-bit signed integer                
REMSQL_DBLE = 2    ' 64-bit IEEE floating point number
REMSQL_TEXT = 3    ' STRING - (CHAR) string of Ascii Bytes                                     
REMSQL_BLOB = 4    ' BLOB - (CHAR) string of Binary Bytes                                       
REMSQL_NULL = 5    ' NULL - Empty Column

These data types co-incide with SQLite fundamental data types and are Numerically equivalent to SQL3 Fundamental Datatypes enumeration.

In this design, if a field is empty (NULL) then you have only taken 5 bytes to store it. If a field has 200 bytes of text for example, it only takes 205 bytes to store it. The bigger benefit is in parsing the data since skipping columns can be done without reading through all 200 bytes to find some terminating character.

The Chunk header should contain things like, number of rows, number of columns, total bytes etc etc. If you use DWORDs (unsigned 64bit integers) then the theoretical limit for a chunk is 4.2gigs which should be enough even for local network tranmission.

The implementation requires writing SQLite/MYSQL wrappers for this functionality. I use the BINARY protocol exclusively, which takes a little time, but you essentially need the following functions:
Client Side: SendRequest() – Sends request, waits for response

Server Side: ProcessRequest() – Receives Request, processes it and returns response

In my case, the response can be !00MB of data or more. I retrieve the entire data set from MySQL and save it to disk on the server. Then I return an empty chunk that contains the data set metrics. The client then requests the data set in chunks of 600k, one by one. If the connection is lost, it just picks up where it left off.

Finally, the data set was mostly text (names addresses etc) so ripe for compression. Security was a very big issue in this case so encryption was essential. This does get a little more complicated to implement, but basically you compress the entire chunk, pad to a length that is a multiple of the block ciphers BLOCKSIZE and encrypt it.

In the process of all this I write a very fast string builder class, an implementation of AES encryption in ASM, and an entire FastCGI library (www.coastrd.com)

So as I said, non trivial. I will be making this library available soon. If you want to check it out, email me.

Once you have the communication written then you can begin to design the synchronization. I would either use a hash for each record, or a simple boolean flag. If anything changes on the server, just send the entire record and overwrite it on the client side (assuming you are trying to keep the clients synchronized…)

If you write your own, please post back here about your experience!

PS. Consider changing the title to be more search friendly.. Perhaps something like:

“Synchronizing an SQLite client database with a MySQL server database”

Leave a Comment