Spark: subtract two DataFrames

According to the Scala API docs, doing:

dataFrame1.except(dataFrame2)

will return a new DataFrame containing rows in dataFrame1 but not in dataframe2.

More Related Contents:

Unable to fetch the value of Println in apache spark
Spark – repartition() vs coalesce()
Spark SQL: apply aggregate functions to a list of columns
How do I split an RDD into two or more RDDs?
Difference between DataFrame, Dataset, and RDD in Spark
DataFrame join optimization – Broadcast Hash Join
What does “Stage Skipped” mean in Apache Spark web UI?
How to access element of a VectorUDT column in a Spark DataFrame?
Which operations preserve RDD order?
Difference between DataFrame, Dataset, and RDD in Spark
Is groupByKey ever preferred over reduceByKey
Default Partitioning Scheme in Spark
How DAG works under the covers in RDD?
Spark parquet partitioning : Large number of files
What is the maximum size for a broadcast object in Spark?
How to loop through each row of dataFrame in pyspark
Why does Spark think this is a cross / Cartesian join
Apache Spark: What is the equivalent implementation of RDD.groupByKey() using RDD.aggregateByKey()?
Why does sortBy transformation trigger a Spark job?
How do I add an persistent column of row ids to Spark DataFrame?
Apache spark dealing with case statements
What is the difference between cache and persist?
Convert null values to empty array in Spark DataFrame
How spark read a large file (petabyte) when file can not be fit in spark’s main memory
How to exclude multiple columns in Spark dataframe in Python
Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?
Why is predicate pushdown not used in typed Dataset API (vs untyped DataFrame API)?
Spark ALS predictAll returns empty
Why is the fold action necessary in Spark?
spark.ml StringIndexer throws ‘Unseen label’ on fit()

More Related Contents:

Leave a Comment Cancel reply