Function returns an empty List in Spark

It happens because filesInZip is not shared between workers. foreach operates on a local copy of filesInZip and when it finishes this copy is simply discarded and garbage collected. If you want to keep the results you should use transformation (most likely a flatMap) and return collected aggregated values.

def listFiles(stream: PortableDataStream): TraversableOnce[String] = ???

zipInputStream.flatMap(listFiles)

You can learn more from Understanding closures

Leave a Comment