According to the Scala API docs, doing:
dataFrame1.except(dataFrame2)
will return a new DataFrame containing rows in dataFrame1 but not in dataframe2.
More Related Contents:
- Unable to fetch the value of Println in apache spark
- Spark – repartition() vs coalesce()
- Spark SQL: apply aggregate functions to a list of columns
- How do I split an RDD into two or more RDDs?
- Difference between DataFrame, Dataset, and RDD in Spark
- DataFrame join optimization – Broadcast Hash Join
- What does “Stage Skipped” mean in Apache Spark web UI?
- How to access element of a VectorUDT column in a Spark DataFrame?
- Which operations preserve RDD order?
- Difference between DataFrame, Dataset, and RDD in Spark
- Is groupByKey ever preferred over reduceByKey
- Default Partitioning Scheme in Spark
- How DAG works under the covers in RDD?
- Spark parquet partitioning : Large number of files
- What is the maximum size for a broadcast object in Spark?
- How to loop through each row of dataFrame in pyspark
- Why does Spark think this is a cross / Cartesian join
- Apache Spark: What is the equivalent implementation of RDD.groupByKey() using RDD.aggregateByKey()?
- Why does sortBy transformation trigger a Spark job?
- How do I add an persistent column of row ids to Spark DataFrame?
- Apache spark dealing with case statements
- What is the difference between cache and persist?
- Convert null values to empty array in Spark DataFrame
- How spark read a large file (petabyte) when file can not be fit in spark’s main memory
- How to exclude multiple columns in Spark dataframe in Python
- Does a join of co-partitioned RDDs cause a shuffle in Apache Spark?
- Why is predicate pushdown not used in typed Dataset API (vs untyped DataFrame API)?
- Spark ALS predictAll returns empty
- Why is the fold action necessary in Spark?
- spark.ml StringIndexer throws ‘Unseen label’ on fit()