Apache Spark doesn’t provide shared memory therefore here:
dataSet.foreach { e =>
items += e
println("len = " + items.length) //1. here length is ok
}
you modify a local copy of items
on a respective exectuor. The original items
list defined on the driver is not modified. As a result this:
items.foreach { x => print(x) }
executes, but there is nothing to print.
Please check Understanding closures
While it would be recommended here, you could replace items with an accumulator
val acc = sc.collectionAccumulator[String]("Items")
dataSet.foreach(e => acc.add(e))