Behavior of the parameter “mapred.min.split.size” in HDFS

The split size is calculated by the formula:-

max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))

In your case it will be:-

split size=max(128,min(Long.MAX_VALUE(default),64))

So above inference:-

  1. each map will process 2 hdfs blocks(assuming each block 64MB): True

  2. There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False

but making the minimum split size greater than the block size increases the split size, but at the cost of locality.

Leave a Comment