Access Array column in Spark

ArrayType is represented in a Row as a scala.collection.mutable.WrappedArray. You can extract it using for example

val arr: Seq[Double] = r.getAs[Seq[Double]]("x")

or

val i: Int = ???
val arr = r.getSeq[Double](i)

or even:

import scala.collection.mutable.WrappedArray

val arr: WrappedArray[Double] = r.getAs[WrappedArray[Double]]("x")

If DataFrame is relatively thin then pattern matching could be a better approach:

import org.apache.spark.sql.Row

df.rdd.map{case Row(x: Seq[Double]) => (x.toArray, x.sum)}

although you have to keep in mind that the type of the sequence is unchecked.

In Spark >= 1.6 you can also use Dataset as follows:

df.select("x").as[Seq[Double]].rdd

Leave a Comment