Does Caffe need data to be shuffled?

Should you shuffle the samples? Think about the learning process if you don’t shuffle; caffe sees only 0 samples – what do you expect the algorithm to deduce? simply predict 0 all the time and everything is cool. If you have plenty of 0 before you hit the first 1 caffe will be very confident in predicting always 0. It will be very difficult to move the model from this point.
On the other hand, if it constantly sees a mix of 0 and 1 it learns from the beginning meaningful features for separating the examples.
Bottom line: it is very advantageous to shuffle the training samples, especially when using SGD-based approaches.

AFAIK, caffe does not randomly sample batch_size samples, but rather goes sequentially over the input DB batch_size after batch_size samples.

TL;DR
shuffle.

Leave a Comment