Spark Dataframe validating column names for parquet writes

For everyone experiencing this in pyspark: this even happened to me after renaming the columns. One way I could get this to work after some iterations is this:

file = "/opt/myfile.parquet"
df = spark.read.parquet(file)
for c in df.columns:
    df = df.withColumnRenamed(c, c.replace(" ", ""))

df = spark.read.schema(df.schema).parquet(file)

More Related Contents:

How to save/insert each DStream into a permanent table
Finding duplicates from large data set using Apache Spark
Using a column value as a parameter to a spark DataFrame function
Find maximum row per group in Spark DataFrame
Split Spark Dataframe string column into multiple columns
Avoid performance impact of a single partition mode in Spark window functions
How to access element of a VectorUDT column in a Spark DataFrame?
How to check if spark dataframe is empty?
How to split a list to multiple columns in Pyspark?
pyspark dataframe filter or include based on list
Pyspark : forward fill with last observation for a DataFrame
Spark load data and add filename as dataframe column
PySpark: how to resample frequencies
How to loop through each row of dataFrame in pyspark
Fill in null with previously known good value with pyspark
Why does Spark think this is a cross / Cartesian join
PySpark – get row number for each row in a group
How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
Rename more than one column using withColumnRenamed
Keep only duplicates from a DataFrame regarding some field
reading json file in pyspark
pyspark: count distinct over a window
Spark lists all leaf node even in partitioned data
How can I access python variable in Spark SQL?
PySpark: How to fillna values in dataframe for specific columns?
How to exclude multiple columns in Spark dataframe in Python
Stratified sampling with pyspark
How to turn off scientific notation in pyspark?
spark.ml StringIndexer throws ‘Unseen label’ on fit()
PySpark error: AttributeError: ‘NoneType’ object has no attribute ‘_jvm’

More Related Contents:

Leave a Comment Cancel reply