Should you shuffle the samples? Think about the learning process if you don’t shuffle; caffe sees only 0
samples – what do you expect the algorithm to deduce? simply predict 0
all the time and everything is cool. If you have plenty of 0
before you hit the first 1
caffe will be very confident in predicting always 0
. It will be very difficult to move the model from this point.
On the other hand, if it constantly sees a mix of 0
and 1
it learns from the beginning meaningful features for separating the examples.
Bottom line: it is very advantageous to shuffle the training samples, especially when using SGD-based approaches.
AFAIK, caffe does not randomly sample batch_size
samples, but rather goes sequentially over the input DB batch_size
after batch_size
samples.
TL;DR
shuffle.