TL;DR Despite their similar names, these arguments have quite difference meanings. The buffer_size
in Dataset.shuffle()
can affect the randomness of your dataset, and hence the order in which elements are produced. The buffer_size
in Dataset.prefetch()
only affects the time it takes to produce the next element.
The buffer_size
argument in tf.data.Dataset.prefetch()
and the output_buffer_size
argument in tf.contrib.data.Dataset.map()
provide a way to tune the performance of your input pipeline: both arguments tell TensorFlow to create a buffer of at most buffer_size
elements, and a background thread to fill that buffer in the background.
(Note that we removed the output_buffer_size
argument from Dataset.map()
when it moved from tf.contrib.data
to tf.data
. New code should use Dataset.prefetch()
after map()
to get the same behavior.)
Adding a prefetch buffer can improve performance by overlapping the preprocessing of data with downstream computation. Typically it is most useful to add a small prefetch buffer (with perhaps just a single element) at the very end of the pipeline, but more complex pipelines can benefit from additional prefetching, especially when the time to produce a single element can vary.
By contrast, the buffer_size
argument to tf.data.Dataset.shuffle()
affects the randomness of the transformation. We designed the Dataset.shuffle()
transformation (like the tf.train.shuffle_batch()
function that it replaces) to handle datasets that are too large to fit in memory. Instead of shuffling the entire dataset, it maintains a buffer of buffer_size
elements, and randomly selects the next element from that buffer (replacing it with the next input element, if one is available). Changing the value of buffer_size
affects how uniform the shuffling is: if buffer_size
is greater than the number of elements in the dataset, you get a uniform shuffle; if it is 1
then you get no shuffling at all. For very large datasets, a typical “good enough” approach is to randomly shard the data into multiple files once before training, then shuffle the filenames uniformly, and then use a smaller shuffle buffer. However, the appropriate choice will depend on the exact nature of your training job.