How to "select distinct" across multiple data frame columns in pandas?

You can use the drop_duplicates method to get the unique rows in a DataFrame:

In [29]: df = pd.DataFrame({'a':[1,2,1,2], 'b':[3,4,3,5]})

In [30]: df
Out[30]:
   a  b
0  1  3
1  2  4
2  1  3
3  2  5

In [32]: df.drop_duplicates()
Out[32]:
   a  b
0  1  3
1  2  4
3  2  5

You can also provide the subset keyword argument if you only want to use certain columns to determine uniqueness. See the docstring.

More Related Contents:

Remove pandas rows with duplicate indices
Remove duplicates from dataframe, based on two columns A,B, keeping row with max value in another column C
Group duplicate column IDs in pandas dataframe
Remove duplicate rows from Pandas dataframe where only some columns have the same value
How to repeat a Pandas DataFrame?
How are iloc and loc different?
Splitting dataframe into multiple dataframes
Efficient way to unnest (explode) multiple list columns in a pandas DataFrame
Unnest (explode) a Pandas Series
How to convert a dataframe to a dictionary
Pandas new column from groupby averages
Pandas: replace substring in string
Finding non-numeric rows in dataframe in pandas?
Calculate time difference between Pandas Dataframe indices
Factorize a column of strings in pandas
How to get the last N rows of a pandas DataFrame?
Writing large Pandas Dataframes to CSV file in chunks
How to merge a Series and DataFrame
Pretty print a pandas dataframe in VS Code
Transpose the data in a column every nth rows in PANDAS
Add column in dataframe from list
Pandas groupby with bin counts
get first and last values in a groupby
how to convert monthly data to quarterly in pandas
How to constuct a column of data frame recursively with pandas-python?
Python pandas groupby aggregate on multiple columns, then pivot
Create adjacency matrix for two columns in pandas dataframe
Good alternative to Pandas .append() method, now that it is being deprecated?
python pandas- apply function with two arguments to columns
Confusion about pandas copy of slice of dataframe warning

How to “select distinct” across multiple data frame columns in pandas?

Leave a Comment Cancel reply