Schema comparison of two dataframes in scala
Based on @Derek Kaknes‘s answer, here’s the solution I came up with for comparing schemas, being concerned only about column name, datatype & nullability and indifferent to metadata import org.apache.spark.sql.DataFrame import org.apache.spark.sql.types.{DataType, StructField} def getCleanedSchema(df: DataFrame): Map[String, (DataType, Boolean)] = { df.schema.map { (structField: StructField) => structField.name.toLowerCase -> (structField.dataType, structField.nullable) }.toMap } // Compare relevant … Read more