What you’re looking for is Seq[o.a.s.sql.Row]
:
import org.apache.spark.sql.Row
val my_size = udf { subjects: Seq[Row] => subjects.size }
Explanation:
- Current representation of
ArrayType
is, as you already know,WrappedArray
soArray
won’t work and it is better to stay on the safe side. - According to the official specification, the local (external) type for
StructType
isRow
. Unfortunately it means that access to the individual fields is not type safe.
Notes:
-
To create
struct
in Spark < 2.3, function passed toudf
has to returnProduct
type (Tuple*
orcase class
), notRow
. That’s because correspondingudf
variants depend on Scala reflection:Defines a Scala closure of n arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure’s signature.
-
In Spark >= 2.3 it is possible to return
Row
directly, as long as the schema is provided.def udf(f: AnyRef, dataType: DataType): UserDefinedFunction
Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion.See for example How to create a Spark UDF in Java / Kotlin which returns a complex type?.