PySpark converting a column of type 'map' to multiple columns in a dataframe

Since keys of the MapType are not a part of the schema you’ll have to collect these first for example like this:

from pyspark.sql.functions import explode

keys = (df
    .select(explode("Parameters"))
    .select("key")
    .distinct()
    .rdd.flatMap(lambda x: x)
    .collect())

When you have this all what is left is simple select:

from pyspark.sql.functions import col

exprs = [col("Parameters").getItem(k).alias(k) for k in keys]
df.select(*exprs)

More Related Contents:

How to add a constant column in a Spark DataFrame?
Spark Dataframe distinguish columns with duplicated name
Pyspark: Split multiple array columns into rows
How do I add a new column to a Spark DataFrame (using PySpark)?
How to change a dataframe column from String type to Double type in PySpark?
Count number of non-NaN entries in each column of Spark dataframe with Pyspark
Retrieve top n in each group of a DataFrame in pyspark
Updating a dataframe column in spark
Dividing complex rows of dataframe to simple rows in Pyspark
Filter Pyspark dataframe column with None value
How to explode multiple columns of a dataframe in pyspark
Spark add new column to dataframe with value from previous row
PySpark: multiple conditions in when clause
Pyspark: Replacing value in a column by searching a dictionary
Count number of non-NaN entries in each column of Spark dataframe in PySpark
Create Spark DataFrame. Can not infer schema for type
Pivot String column on Pyspark Dataframe
Filtering DataFrame using the length of a column
Multiple condition filter on dataframe
How to return a “Tuple type” in a UDF in PySpark?
How to perform union on two DataFrames with different amounts of columns in spark?
Does spark predicate pushdown work with JDBC?
‘PipelinedRDD’ object has no attribute ‘toDF’ in PySpark
Filtering a Pyspark DataFrame with SQL-like IN clause
Passing a data frame column and external list to udf under withColumn
Why is Apache-Spark – Python so slow locally as compared to pandas?
Using UDF ignores condition in when
How do I convert an array (i.e. list) column to Vector
Add column sum as new column in PySpark dataframe
How to drop all columns with null values in a PySpark DataFrame?

PySpark converting a column of type ‘map’ to multiple columns in a dataframe

Leave a Comment Cancel reply