Caffe | solver.prototxt values setting strategy

In order to set these values in a meaningful manner, you need to have a few more bits of information regarding your data:

1. Training set size the total number of training examples you have, let’s call this quantity T.
2. Training batch size the number of training examples processed together in a single batch, this is usually set by the input data layer in the 'train_val.prototxt'. For example, in this file the train batch size is set to 256. Let’s denote this quantity by tb.
3. Validation set size the total number of examples you set aside for validating your model, let’s denote this by V.
4. Validation batch size value set in batch_size for the TEST phase. In this example it is set to 50. Let’s call this vb.

Now, during training, you would like to get an un-biased estimate of the performance of your net every once in a while. To do so you run your net on the validation set for test_iter iterations. To cover the entire validation set you need to have test_iter = V/vb.
How often would you like to get this estimation? It’s really up to you. If you have a very large validation set and a slow net, validating too often will make the training process too long. On the other hand, not validating often enough may prevent you from noting if and when your training process failed to converge. test_interval determines how often you validate: usually for large nets you set test_interval in the order of 5K, for smaller and faster nets you may choose lower values. Again, all up to you.

In order to cover the entire training set (completing an “epoch”) you need to run T/tb iterations. Usually one trains for several epochs, thus max_iter=#epochs*T/tb.

Regarding iter_size: this allows to average gradients over several training mini batches, see this thread fro more information.

Leave a Comment