Select rows from one data.frame that are not present in a second data.frame

sqldf provides a nice solution

a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3])

require(sqldf)

a1NotIna2 <- sqldf('SELECT * FROM a1 EXCEPT SELECT * FROM a2')

And the rows which are in both data frames:

a1Ina2 <- sqldf('SELECT * FROM a1 INTERSECT SELECT * FROM a2')

The new version of dplyr has a function, anti_join, for exactly these kinds of comparisons

require(dplyr) 
anti_join(a1,a2)

And semi_join to filter rows in a1 that are also in a2

semi_join(a1,a2)

Leave a Comment