Default Partitioning Scheme in Spark
You have to distinguish between two different things: partitioning as distributing data between partitions depending on a value of the key which is limited only to the PairwiseRDDs (RDD[(T, U)]). This creates a relationship between partition and the set of keys which can be found on a given partition. partitioning as splitting input into multiple … Read more