Perform a semi-join with data.table

More possibilities :

w = unique(x[y,which=TRUE])  # the row numbers in x which have a match from y
x[w]

If there are duplicate key values in x, then that needs :

w = unique(x[y,which=TRUE,allow.cartesian=TRUE])
x[w]

Or, the other way around :

setkey(y,x)
w = !is.na(y[x,which=TRUE,mult="first"])
x[w]

If nrow(x) << nrow(y) then the y[x] approach should be faster.
If nrow(x) >> nrow(y) then the x[y] approach should be faster.

But the anti anti join appeals too 🙂

Leave a Comment