If you take a look at the signature
textFile(path: String, minPartitions: Int = defaultMinPartitions): RDD[String]
you’ll see that the argument you use is called minPartitions
and this pretty much describes its function. In some cases even that is ignored but it is a different matter. Input format which is used behind the scenes still decides how to compute splits.
In this particular case you could probably use mapred.min.split.size
to increase split size (this will work during load) or simply repartition
after loading (this will take effect after data is loaded) but in general there should be no need for that.