Spark: Best practice for retrieving big data from RDD to local machine
Update: RDD.toLocalIterator method that appeared after the original answer has been written is a more efficient way to do the job. It uses runJob to evaluate only a single partition on each step. TL;DR And the original answer might give a rough idea how it works: First of all, get the array of partition indexes: … Read more