Bulk insert in Java using prepared statements batch update

I’ll address your questions in turn.

  • Will the executeBatch method tries to send all the data at once?

This can vary with each JDBC driver, but the few I’ve studied will iterate over each batch entry and send the arguments together with the prepared statement handle each time to the database for execution. That is, in your example above, there would 50,000 executions of the prepared statement with 50,000 pairs of arguments, but these 50,000 steps can be done in a lower-level “inner loop,” which is where the time savings come in. As a rather stretched analogy, it’s like dropping out of “user mode” down into “kernel mode” and running the entire execution loop there. You save the cost of diving in and out of that lower-level mode for each batch entry.

  • Is there a way to define the batch size

You’ve defined it implicitly here by pushing 50,000 argument sets in before executing the batch via Statement#executeBatch(). A batch size of one is just as valid.

  • Is there any better way to speed up the process of bulk insertion?

Consider opening a transaction explicitly before the batch insertion, and commit it afterward. Don’t let either the database or the JDBC driver impose a transaction boundary around each insertion step in the batch. You can control the JDBC layer with the Connection#setAutoCommit(boolean) method. Take the connection out of auto-commit mode first, then populate your batches, start a transaction, execute the batch, then commit the transaction via Connection#commit().

This advice assumes that your insertions won’t be contending with concurrent writers, and assumes that these transaction boundaries will give you sufficiently consistent values read from your source tables for use in the insertions. If that’s not the case, favor correctness over speed.

  • Is it better to use a updatable ResultSet or PreparedStatement with batch execution?

Nothing beats testing with your JDBC driver of choice, but I expect the latter—PreparedStatement and Statement#executeBatch() will win out here. The statement handle may have an associated list or array of “batch arguments,” with each entry being the argument set provided in between calls to Statement#executeBatch() and Statement#addBatch() (or Statement#clearBatch()). The list will grow with each call to addBatch(), and not be flushed until you call executeBatch(). Hence, the Statement instance is really acting as an argument buffer; you’re trading memory for convenience (using the Statement instance in lieu of your own external argument set buffer).

Again, you should consider these answers general and speculative so long as we’re not discussing a specific JDBC driver. Each driver varies in sophistication, and each will vary in which optimizations it pursues.

Leave a Comment