How to join two dataframes by nearest time-date?

data.table should work for this (can you explain the error you’re coming up against?), although it does tend to convert POSIXlt to POSIXct on its own (perhaps do that conversion on your datetime column manually to keep data.table happy). Also make sure you’re setting the key column before using roll.

(I’ve created my own example tables here to make my life that little bit easier. If you want to use dput on yours, I’m happy to update this example with your data):

new <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "t","u","v" ) )
head( new, 2 )

                  date data.new
1: 2016-03-02 12:20:00        t
2: 2016-03-07 12:20:00        u

old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
head( old, 2 )


                  date data.old
1: 2016-03-02 12:20:00        a
2: 2016-03-07 12:20:00        b

setkey( new, date )
setkey( old, date )

combined <- new[ old, roll = "nearest" ]
combined

                  date data.new data.old
1: 2015-03-02 12:20:00        t        d
2: 2016-03-02 12:20:00        t        a
3: 2016-03-07 12:20:00        u        b
4: 2016-04-02 12:20:00        v        c

I’ve intentionally made the two tables different row lengths, in order to show how the rolling join deals with multiple matches. You can switch the way it joins with:

combined <- old[ new, roll = "nearest" ]
combined

                  date data.old data.new
1: 2016-03-02 12:20:00        a        t
2: 2016-03-07 12:20:00        b        u
3: 2016-04-02 12:20:00        c        v

Leave a Comment