How to drop duplicates based on two or more subsets criteria in Pandas data-frame

Your syntax is wrong. Here’s the correct way:

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

Or in this specific case, just simply:

df.drop_duplicates()

Both return the following:

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

Take a look at the df.drop_duplicates documentation for syntax details. subset should be a sequence of column labels.

Leave a Comment