Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame

In the upcoming SPARK 1.4.0 release (should be released in the next couple of days). You can use the when/otherwise syntax:

// Create the dataframe
val df = Seq("Red", "Green", "Blue").map(Tuple1.apply).toDF("color")

// Use when/otherwise syntax
val df1 = df.withColumn("Green_Ind", when($"color" === "Green", 1).otherwise(0))

If you are using SPARK 1.3.0 you can chose to use a UDF:

// Define the UDF
val isGreen = udf((color: String) => {
  if (color == "Green") 1
  else 0
})
val df2 = df.withColumn("Green_Ind", isGreen($"color"))

Leave a Comment