How to filter rows based on difference in dates between rows in R?

Here’s an attempt using rolling joins in the data.table which I believe should be efficient

library(data.table)
# Set minimum distance
mindist <- 91L 
# Make sure it is a real Date
setDT(df)[, date := as.IDate(date)] 
# Create a new column with distance + 1 to roll join too
df[, date2 := date - (mindist + 1L)] 
# Perform a rolling join per each value in df$date2 that has atleast 91 difference from df$date
unique(df[df, on = c(id = "id", date = "date2"), roll = -Inf], by = c("id", "var1"))
#    id var1       date      date2 i.var1     i.date
# 1:  1    A 2005-10-01 2005-10-01      A 2006-01-01
# 2:  1    C 2006-03-02 2006-03-02      C 2006-06-02
# 3:  1    E 2007-08-31 2007-08-31      E 2007-12-01
# 4:  2    F 2007-01-18 2007-01-18      F 2007-04-20

This will give you two additional columns but it’s not a big of a deal IMO. Logically this makes sense and I’ve tested it successfully on different scenarios but it may need some additional proof tests.

Leave a Comment