Python – Delete duplicates in a dataframe based on two columns combinations?

By using np.sort with duplicated

df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
Out[614]: 
  Name1 Name2  Value
1   Ale  Juan      1

Performance

df=pd.concat([df]*100000)

%timeit df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
10 loops, best of 3: 69.3 ms per loop
%timeit df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
1 loop, best of 3: 3.72 s per loop

Leave a Comment