Using Hibernate’s ScrollableResults to slowly read 90 million records

Using setFirstResult and setMaxResults is your only option that I’m aware of.

Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it’s just shortcomings in the MySQL java driver.

I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.

Just make sure to use a stateless session so there’s no session level cache or dirty tracking etc.

EDIT:

Your UPDATE 2 is the best you’re going to get unless you break out of the MySQL J/Connector. Though there’s no reason you can’t up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I’d modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.

Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.

select * 
from person 
where id > <max_id_of_last_batch> and <other_conditions> 
order by id asc  
limit <batch_size>

Leave a Comment