Scala spark, listbuffer is empty

Apache Spark doesn’t provide shared memory therefore here:

dataSet.foreach { e =>
  items += e
  println("len = " + items.length) //1. here length is ok
}

you modify a local copy of items on a respective exectuor. The original items list defined on the driver is not modified. As a result this:

items.foreach { x => print(x) }

executes, but there is nothing to print.

Please check Understanding closures

While it would be recommended here, you could replace items with an accumulator

val acc = sc.collectionAccumulator[String]("Items")
dataSet.foreach(e => acc.add(e))

Leave a Comment