Python - Delete duplicates in a dataframe based on two columns combinations?

By using np.sort with duplicated

df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
Out[614]: 
  Name1 Name2  Value
1   Ale  Juan      1

Performance

df=pd.concat([df]*100000)

%timeit df[pd.DataFrame(np.sort(df[['Name1','Name2']].values,1)).duplicated()]
10 loops, best of 3: 69.3 ms per loop
%timeit df[~df[['Name1', 'Name2']].apply(frozenset, axis=1).duplicated()]
1 loop, best of 3: 3.72 s per loop

More Related Contents:

how to sort pandas dataframe from one column
Why does my Pandas DataFrame not display new order using `sort_values`?
Sort a pandas dataframe series by month name
Find the unique values in a column and then sort them
Pandas DataFrame sort by categorical column but by specific class ordering
Sorting by absolute value without changing the data
How to sort pandas data frame using values from several columns?
Multi Index Sorting in Pandas
Convert pandas dataframe to NumPy array
Select rows in pandas MultiIndex DataFrame
Python: pandas merge multiple dataframes
Find the column name which has the maximum value for each row
Import CSV file as a pandas DataFrame
Get a list from Pandas DataFrame column headers
Creating an empty Pandas DataFrame, then filling it?
What does axis in pandas mean?
How to map numeric data into categories / bins in Pandas dataframe
How to loop over grouped Pandas dataframe?
Update a dataframe in pandas while iterating row by row
Conditionally format Python pandas cell
How to concatenate multiple column values into a single column in Pandas dataframe
How can I use cumsum within a group in Pandas?
Pandas dataframe – running sum with reset
Pandas groupby for zero values
Select row by max value in group in a pandas dataframe
Pandas Data Frame how to merge columns
Pandas sum across columns and divide each cell from that value
get first and last values in a groupby
Python pandas groupby aggregate on multiple columns, then pivot
python pandas- apply function with two arguments to columns

Python – Delete duplicates in a dataframe based on two columns combinations?

Leave a Comment Cancel reply