Finding Overlaps between interval sets / Efficient Overlap Joins

Ha, nice timing :). Just a few days back, overlap joins (or interval joins) was implemented. in data.table The function is foverlaps() and is available from the github project page. Make sure to have a look at ?foverlaps.

setkey(ref, space, t1, t2)
foverlaps(map, ref, type="within", nomatch=0L)

I think this is what you’re after. This’ll result in the join result only where there’s a match, and it’ll check for t1,t2 overlaps between ref and map within space identifier.. If you don’t want that, just remove space from the key column. And if you want all matches, remove nomatch=0L – the default is nomatch=NA which returns all.

The function is new (but has been rigorously tested) and is therefore not feature complete. If you’ve any suggestions for improvement or come across any issues, please feel free to file an issue.

Leave a Comment