Convert null values to empty array in Spark DataFrame

You can use an UDF: import org.apache.spark.sql.functions.udf val array_ = udf(() => Array.empty[Int]) combined with WHEN or COALESCE: df.withColumn(“myCol”, when(myCol.isNull, array_()).otherwise(myCol)) df.withColumn(“myCol”, coalesce(myCol, array_())).show In the recent versions you can use array function: import org.apache.spark.sql.functions.{array, lit} df.withColumn(“myCol”, when(myCol.isNull, array().cast(“array<integer>”)).otherwise(myCol)) df.withColumn(“myCol”, coalesce(myCol, array().cast(“array<integer>”))).show Please note that it will work only if conversion from string to the … Read more

Save Spark Dataframe into Elasticsearch – Can’t handle type exception

The answer for this one was tricky, but thanks to samklr, I have managed to figure about what the problem was. The solution isn’t straightforward nevertheless and might consider some “unnecessary” transformations. First let’s talk about Serialization. There are two aspects of serialization to consider in Spark serialization of data and serialization of functions. In … Read more