You can’t map
a dataframe, but you can convert the dataframe to an RDD and map that by doing spark_df.rdd.map()
. Prior to Spark 2.0, spark_df.map
would alias to spark_df.rdd.map()
. With Spark 2.0, you must explicitly call .rdd
first.
More Related Contents:
- Calling Java/Scala function from a task
- Spark RDD to DataFrame python
- Convert spark DataFrame column to python list
- How to create a custom Estimator in PySpark
- Spark mllib predicting weird number or NaN
- Finding duplicates from large data set using Apache Spark
- How to find median and quantiles using Spark
- Spark Dataframe distinguish columns with duplicated name
- How to change dataframe column names in pyspark?
- I can’t seem to get –py-files on Spark to work
- How can we JOIN two Spark SQL dataframes using a SQL-esque “LIKE” criterion?
- How to add third-party Java JAR files for use in PySpark
- Pyspark ‘NoneType’ object has no attribute ‘_jvm’ error
- Cast column containing multiple string date formats to DateTime in Spark
- How to explode multiple columns of a dataframe in pyspark
- How to extract an element from a array in pyspark
- PySpark: multiple conditions in when clause
- Create Spark DataFrame. Can not infer schema for type
- How to transform data with sliding window over time series data in Pyspark
- Create single row dataframe from list of list PySpark
- Build a hierarchy from a relational data-set using Pyspark
- How do I convert an array (i.e. list) column to Vector
- Add column sum as new column in PySpark dataframe
- Pyspark – converting json string to DataFrame
- How to add suffix and prefix to all columns in python/pyspark dataframe
- Spark DAG differs with ‘withColumn’ vs ‘select’
- How to drop all columns with null values in a PySpark DataFrame?
- Pyspark changing type of column from date to string
- PySpark: compute row maximum of the subset of columns and add to an exisiting dataframe
- Random numbers generation in PySpark