scala - w3toppers.com

Spark ML VectorAssembler returns strange output

There is nothing strange about the output. Your vector seems to have lots of zero elements thus spark used it’s sparse representation. To explain further : It seems like your vector is composed of 18 elements (dimension). This indices [0,1,6,9,14,17] from the vector contains non zero elements which are in order [17.0,15.0,3.0,1.0,4.0,2.0] Sparse Vector representation … Read more

Spark textFile vs wholeTextFiles

The main difference, as you mentioned, is that textFile will return an RDD with each line as an element while wholeTextFiles returns a PairRDD with the key being the file path. If there is no need to separate the data depending on the file, simply use textFile. When reading uncompressed files with textFile, it will … Read more

Schema comparison of two dataframes in scala

Based on @Derek Kaknes‘s answer, here’s the solution I came up with for comparing schemas, being concerned only about column name, datatype & nullability and indifferent to metadata import org.apache.spark.sql.DataFrame import org.apache.spark.sql.types.{DataType, StructField} def getCleanedSchema(df: DataFrame): Map[String, (DataType, Boolean)] = { df.schema.map { (structField: StructField) => structField.name.toLowerCase -> (structField.dataType, structField.nullable) }.toMap } // Compare relevant … Read more

Loaner Pattern in Scala

Make sure that whatever you compute is evaluated eagerly and no longer depends on the resource. Scala makes lazy computation fairly easy. For instance, if you wrap scala.io.Source.fromFile in this way, you might try readFile(“test.txt”)(_.getLines) Unfortunately, this doesn’t work because getLines is lazy (returns an iterator). And Scala doesn’t have any great way to indicate … Read more

Scala single method interface implementation

Scala has experimental support for SAMs starting with 2.11, under the flag -Xexperimental: Welcome to Scala version 2.11.0-RC3 (OpenJDK 64-Bit Server VM, Java 1.7.0_51). Type in expressions to have them evaluated. Type :help for more information. scala> :set -Xexperimental scala> val r: Runnable = () => println(“hello world”) r: Runnable = $anonfun$1@7861ff33 scala> new Thread(r).run … Read more

Type inference on method return type

The return type of a method is either the type of the last statement in the block that defines it, or the type of the expression that defines it, in the absence of a block. When you use return inside a method, you introduce another statement from which the method may return. That means Scala … Read more

scala parallel collections degree of parallelism

With the newest trunk, using the JVM 1.6 or newer, use the: collection.parallel.ForkJoinTasks.defaultForkJoinPool.setParallelism(parlevel: Int) This may be a subject to changes in the future, though. A more unified approach to configuring all Scala task parallel APIs is planned for the next releases. Note, however, that while this will determine the number of processors the query … Read more

Filter spark DataFrame on string contains

You can use contains (this works with an arbitrary sequence): df.filter($”foo”.contains(“bar”)) like (SQL like with SQL simple regular expression whith _ matching an arbitrary character and % matching an arbitrary sequence): df.filter($”foo”.like(“bar”)) or rlike (like with Java regular expressions): df.filter($”foo”.rlike(“bar”)) depending on your requirements. LIKE and RLIKE should work with SQL expressions as well.

How to instantiate an instance of type represented by type parameter in Scala

EDIT – apologies, I only just noticed your first error. There is no way of instantiating a T at runtime because the type information is lost when your program is compiled (via type erasure) You will have to pass in some factory to achieve the construction: class BalanceActor[T <: Actor](val fac: () => T) extends … Read more

Is it possible to have tuple assignment to variables in Scala? [duplicate]

This isn’t simply “multiple variable assignment”, it’s fully-featured pattern matching! So the following are all valid: val (a, b) = (1, 2) val Array(a, b) = Array(1, 2) val h :: t = List(1, 2) val List(a, Some(b)) = List(1, Option(2)) This is the way that pattern matching works, it’ll de-construct something into smaller parts, … Read more