How to create a Dataset of Maps?

It is not covered in 2.2, but can be easily addressed. You can add required Encoder using ExpressionEncoder, either explicitly: import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder import org.apache.spark.sql.Encoder spark .createDataset(Seq(Map(1 -> 2)))(ExpressionEncoder(): Encoder[Map[Int, Int]]) or implicitly: implicit def mapIntIntEncoder: Encoder[Map[Int, Int]] = ExpressionEncoder() spark.createDataset(Seq(Map(1 -> 2)))

Encode an ADT / sealed trait hierarchy into Spark DataSet column

TL;DR There is no good solution right now, and given Spark SQL / Dataset implementation, it is unlikely there will be one in the foreseeable future. You can use generic kryo or java encoder val occupation: Seq[Occupation] = Seq(SoftwareEngineer, Wizard(1), Other(“foo”)) spark.createDataset(occupation)(org.apache.spark.sql.Encoders.kryo[Occupation]) but is hardly useful in practice. UDT API provides another possible approach as … Read more

Why is “Unable to find encoder for type stored in a Dataset” when creating a dataset of custom case class?

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work: val sparkSession: SparkSession = ??? import sparkSession.implicits._ val dataset = sparkSession.createDataset(dataList) Alternatively you can provide … Read more

Encoder error while trying to map dataframe row to updated row

There is nothing unexpected here. You’re trying to use code which has been written with Spark 1.x and is no longer supported in Spark 2.0: in 1.x DataFrame.map is ((Row) ⇒ T)(ClassTag[T]) ⇒ RDD[T] in 2.x Dataset[Row].map is ((Row) ⇒ T)(Encoder[T]) ⇒ Dataset[T] To be honest it didn’t make much sense in 1.x either. Independent … Read more

How to store custom objects in Dataset?

Update This answer is still valid and informative, although things are now better since 2.2/2.3, which adds built-in encoder support for Set, Seq, Map, Date, Timestamp, and BigDecimal. If you stick to making types with only case classes and the usual Scala types, you should be fine with just the implicit in SQLImplicits. Unfortunately, virtually … Read more