Spark unionAll multiple dataframes

For pyspark you can do the following:

from functools import reduce
from pyspark.sql import DataFrame

dfs = [df1,df2,df3]
df = reduce(DataFrame.unionAll, dfs)

It’s also worth nothing that the order of the columns in the dataframes should be the same for this to work. This can silently give unexpected results if you don’t have the correct column orders!!

If you are using pyspark 2.3 or greater, you can use unionByName so you don’t have to reorder the columns.

More Related Contents:

Spark 2.0 Dataset vs DataFrame
Spark – load CSV file as DataFrame?
How to define partitioning of DataFrame?
Automatically and Elegantly flatten DataFrame in Spark SQL
Spark extracting values from a Row
Caused by: java.lang.NullPointerException at org.apache.spark.sql.Dataset
SparkSQL: How to deal with null values in user defined function?
How to aggregate values into collection after groupBy?
How to read records in JSON format from Kafka using Structured Streaming?
Spark Dataframe :How to add a index Column : Aka Distributed Data Index
Defining a UDF that accepts an Array of objects in a Spark DataFrame?
Processing multiple files as independent RDD’s in parallel
MatchError while accessing vector column in Spark 2.0
How to save DataFrame directly to Hive?
Spark / Scala: forward fill with last observation
About how to add a new column to an existing DataFrame with random values in Scala
How to use Column.isin with list?
How to compare two dataframe and print columns that are different in scala
Perform a typed join in Scala with Spark Datasets
Apache Spark: Get number of records per partition
Renaming column names of a DataFrame in Spark Scala
DataFrame-ified zipWithIndex
Derive multiple columns from a single column in a Spark DataFrame
How to save a spark DataFrame as csv on disk?
What is going wrong with `unionAll` of Spark `DataFrame`?
How to sort by column in descending order in Spark SQL?
How to get keys and values from MapType column in SparkSQL DataFrame
Why is join not possible after show operator?
How to create DataFrame from Scala’s List of Iterables?
How to create a Dataset of Maps?

More Related Contents:

Leave a Comment Cancel reply