rdd - w3toppers.com

How does HashPartitioner work?

Well, lets make your dataset marginally more interesting: val rdd = sc.parallelize(for { x <- 1 to 3 y <- 1 to 2 } yield (x, None), 8) We have six elements: rdd.count Long = 6 no partitioner: rdd.partitioner Option[org.apache.spark.Partitioner] = None and eight partitions: rdd.partitions.length Int = 8 Now lets define small helper to … Read more

Spark – repartition() vs coalesce()

It avoids a full shuffle. If it’s known that the number is decreasing then the executor can safely keep data on the minimum number of partitions, only moving the data off the extra nodes, onto the nodes that we kept. So, it would go something like this: Node 1 = 1,2,3 Node 2 = 4,5,6 … Read more

Unable to fetch the value of Println in apache spark

You have an extrat upper case try this : rddUnion.foreach(println)