set difference for pandas

Bit convoluted but if you want to totally ignore the index data. Convert the contents of the dataframes to sets of tuples containing the columns:

ds1 = set(map(tuple, df1.values))
ds2 = set(map(tuple, df2.values))

This step will get rid of any duplicates in the dataframes as well (index ignored)

set([(1, 2), (3, 4), (2, 3)])   # ds1

can then use set methods to find anything. Eg to find differences:

ds1.difference(ds2)

gives:
set([(1, 2), (3, 4)])

can take that back to dataframe if needed. Note have to transform set to list 1st as set cannot be used to construct dataframe:

pd.DataFrame(list(ds1.difference(ds2)))

Leave a Comment