scala parallel collections degree of parallelism

With the newest trunk, using the JVM 1.6 or newer, use the: collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int) This may be a subject to changes in the future, though. A more unified approach to configuring all Scala task parallel APIs is planned for the next releases. Note, however, that while this will determine the number of processors the query … Read more

Convert Java Map to Scala Map

Edit: the recommended way is now to use JavaConverters and the .asScala method: import scala.collection.JavaConverters._ val myScalaMap = myJavaMap.asScala.mapValues(_.asScala.toSet) This has the advantage of not using magical implicit conversions but explicit calls to .asScala, while staying clean and consise. The original answer with JavaConversions: You can use scala.collection.JavaConversions to implicitly convert between Java and Scala: … Read more

Scala Sets contain the same elements, but sameElements() returns false

The Scala collections library provides specialised implementations for Sets of fewer than 5 values (see the source). The iterators for these implementations return elements in the order in which they were added, rather than the consistent, hash-based ordering used for larger Sets. Furthermore, sameElements (scaladoc) is defined on Iterables (it is implemented in IterableLike – … Read more

Function returns an empty List in Spark

It happens because filesInZip is not shared between workers. foreach operates on a local copy of filesInZip and when it finishes this copy is simply discarded and garbage collected. If you want to keep the results you should use transformation (most likely a flatMap) and return collected aggregated values. def listFiles(stream: PortableDataStream): TraversableOnce[String] = ??? … Read more