Filtering a spark dataframe based on date

The following solutions are applicable since spark 1.5 :

For lower than :

// filter data where the date is lesser than 2015-03-14
data.filter(data("date").lt(lit("2015-03-14")))      

For greater than :

// filter data where the date is greater than 2015-03-14
data.filter(data("date").gt(lit("2015-03-14"))) 

For equality, you can use either equalTo or === :

data.filter(data("date") === lit("2015-03-14"))

If your DataFrame date column is of type StringType, you can convert it using the to_date function :

// filter data where the date is greater than 2015-03-14
data.filter(to_date(data("date")).gt(lit("2015-03-14"))) 

You can also filter according to a year using the year function :

// filter data where year is greater or equal to 2016
data.filter(year($"date").geq(lit(2016))) 

Leave a Comment