Why is the fold action necessary in Spark?

Empty RDD It cannot be substituted when RDD is empty: val rdd = sc.emptyRDD[Int] rdd.reduce(_ + _) // java.lang.UnsupportedOperationException: empty collection at // org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$apply$ … rdd.fold(0)(_ + _) // Int = 0 You can of course combine reduce with condition on isEmpty but it is rather ugly. Mutable buffer Another use case for fold is … Read more

How does foldr work?

The easiest way to understand foldr is to rewrite the list you’re folding over without the sugar. [1,2,3,4,5] => 1:(2:(3:(4:(5:[])))) now what foldr f x does is that it replaces each : with f in infix form and [] with x and evaluates the result. For example: sum [1,2,3] = foldr (+) 0 [1,2,3] [1,2,3] … Read more

Difference between fold and foldLeft or foldRight?

Short answer: foldRight associates to the right. I.e. elements will be accumulated in right-to-left order: List(a,b,c).foldRight(z)(f) = f(a, f(b, f(c, z))) foldLeft associates to the left. I.e. an accumulator will be initialized and elements will be added to the accumulator in left-to-right order: List(a,b,c).foldLeft(z)(f) = f(f(f(z, a), b), c) fold is associative in that the … Read more

Scala : fold vs foldLeft

The method fold (originally added for parallel computation) is less powerful than foldLeft in terms of types it can be applied to. Its signature is: def fold[A1 >: A](z: A1)(op: (A1, A1) => A1): A1 This means that the type over which the folding is done has to be a supertype of the collection element … Read more