Akka-Stream implementation slower than single threaded implementation

Akka Streams is using asynchronous message passing between Actors to implement stream processing stages. Passing data across an asynchronous boundary has an overhead that you are seeing here: your computation seems to take only about 160ns (derived from the single-threaded measurement) while the streaming solution takes roughly 1µs per element, which is dominated by the … Read more

Strange behaviour of the Array type with `==` operator

Because the definition of “equals” for Arrays is that they refer to the same array. This is consistent with Java’s array equality, using Object.Equals, so it compares references. If you want to check pairwise elements, then use sameElements Array(‘a’,’b’).sameElements(Array(‘a’,’b’)) or deepEquals, which has been deprecated in 2.8, so instead use: Array(‘a’,’b’).deep.equals(Array(‘a’,’b’).deep) There’s a good Nabble … Read more

Is there a safe way in Scala to transpose a List of unequal-length Lists?

How about this: scala> def transpose[A](xs: List[List[A]]): List[List[A]] = xs.filter(_.nonEmpty) match { | case Nil => Nil | case ys: List[List[A]] => ys.map{ _.head }::transpose(ys.map{ _.tail }) | } warning: there were unchecked warnings; re-run with -unchecked for details transpose: [A](xs: List[List[A]])List[List[A]] scala> val ls = List(List(1, 2, 3), List(4, 5), List(6, 7, 8)) ls: … Read more

Dynamically compiling scala class files at runtime in Scala 2.11

If your goal is to run external scala classes in runtime, I’d suggest using eval with scala.tools.reflect.ToolBox (it is included in REPL, but for normal usage you have to add scala-reflect.jar): import scala.reflect.runtime.universe import scala.tools.reflect.ToolBox val tb = universe.runtimeMirror(getClass.getClassLoader).mkToolBox() tb.eval(tb.parse(“””println(“hello!”)”””)) You also can compile files, using tb.compile. Modified with example: assume you have external file … Read more

Scala – initialization order of vals

Vals are initialized in the order they are declared (well, precisely, non-lazy vals are), so properties is getting initialized before loadedProps. Or in other words, loadedProps is still null when propertiesis getting initialized. The simplest solution here is to define loadedProps before properties: class Config { private val loadedProps = { val p = new … Read more

Spark column string replace when present in other column (row)

You could simply use regexp_replace df5.withColumn(“sentence_without_label”, regexp_replace($”sentence” , lit($”label”), lit(“” ))) or you can use simple udf function as below val df5 = spark.createDataFrame(Seq( (“Hi I heard about Spark”, “Spark”), (“I wish Java could use case classes”, “Java”), (“Logistic regression models are neat”, “models”) )).toDF(“sentence”, “label”) val replace = udf((data: String , rep : String)=>data.replaceAll(rep, … Read more

Spark Dataframe Random UUID changes after every transformation/action

It is an expected behavior. User defined functions have to be deterministic: The user-defined functions must be deterministic. Due to optimization, duplicate invocations may be eliminated or the function may even be invoked more times than it is present in the query. If you want to include non-deterministic function and preserve the output you should … Read more