How to exclude multiple columns in Spark dataframe in Python

In PySpark 2.1.0 method drop supports multiple columns:

PySpark 2.0.2:

DataFrame.drop(col)

PySpark 2.1.0:

DataFrame.drop(*cols)

Example:

df.drop('col1', 'col2')

or using the * operator as

df.drop(*['col1', 'col2'])

More Related Contents:

How to access element of a VectorUDT column in a Spark DataFrame?
How to loop through each row of dataFrame in pyspark
Why does Spark think this is a cross / Cartesian join
spark.ml StringIndexer throws ‘Unseen label’ on fit()
Finding duplicates from large data set using Apache Spark
How to make good reproducible Apache Spark examples
Find maximum row per group in Spark DataFrame
How to add a constant column in a Spark DataFrame?
Multiple Aggregate operations on the same column of a spark dataframe
Spark Dataframe distinguish columns with duplicated name
How do I add a new column to a Spark DataFrame (using PySpark)?
Spark Dataframe validating column names for parquet writes
Updating a dataframe column in spark
Dividing complex rows of dataframe to simple rows in Pyspark
How to save/insert each DStream into a permanent table
Pyspark : forward fill with last observation for a DataFrame
Spark load data and add filename as dataframe column
Spark add new column to dataframe with value from previous row
PySpark: how to resample frequencies
Pyspark: Replacing value in a column by searching a dictionary
Create Spark DataFrame. Can not infer schema for type
Pivot String column on Pyspark Dataframe
How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
Pyspark filter dataframe by columns of another dataframe
Rename more than one column using withColumnRenamed
spark dataframe drop duplicates and keep first
pyspark: count distinct over a window
PySpark: How to fillna values in dataframe for specific columns?
Filtering DataFrame using the length of a column
PySpark error: AttributeError: ‘NoneType’ object has no attribute ‘_jvm’

More Related Contents:

Leave a Comment Cancel reply