How can I change column types in Spark SQL’s DataFrame?

Edit: Newest newest version

Since spark 2.x you should use dataset api instead when using Scala [1]. Check docs here:,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame

If working with python, even though easier, I leave the link here as it’s a very highly voted question:

>>> df.withColumn('age2', df.age + 2).collect()
[Row(age=2, name="Alice", age2=4), Row(age=5, name="Bob", age2=7)]


In the Scala API, DataFrame is simply a type alias of Dataset[Row].
While, in Java API, users need to use Dataset to represent a

Edit: Newest version

Since spark 2.x you can use .withColumn. Check the docs here:,col:org.apache.spark.sql.Column):org.apache.spark.sql.DataFrame

Oldest answer

Since Spark version 1.4 you can apply the cast method with DataType on the column:

import org.apache.spark.sql.types.IntegerType
val df2 = df.withColumn("yearTmp", df.year.cast(IntegerType))
    .withColumnRenamed("yearTmp", "year")

If you are using sql expressions you can also do:

val df2 = df.selectExpr("cast(year as int) year", 

For more info check the docs:

