I got the same error. I solved it installing the previous version of Spark (2.3 instead of 2.4). Now it works perfectly, maybe it is an issue of the lastest version of pyspark.
More Related Contents:
- Convert pyspark string to date format
- Calling Java/Scala function from a task
- How to use JDBC source to write and read data in (Py)Spark?
- How to link PyCharm with PySpark?
- Applying UDFs on GroupedData in PySpark (with functioning python example)
- Pyspark: Split multiple array columns into rows
- How to change a dataframe column from String type to Double type in PySpark?
- Count number of non-NaN entries in each column of Spark dataframe with Pyspark
- importing pyspark in python shell
- collect_list by preserving order based on another variable
- Rename nested field in spark dataframe
- Spark DataFrame: Computing row-wise mean (or any aggregate operation)
- How to determine if object is a valid key-value pair in PySpark
- PySpark converting a column of type ‘map’ to multiple columns in a dataframe
- A list as a key for PySpark’s reduceByKey
- Best way to get the max value in a Spark dataframe column
- How to map features from the output of a VectorAssembler back to the column names in Spark ML?
- Apache Spark Python Cosine Similarity over DataFrames
- Comparing columns in Pyspark
- Reshaping/Pivoting data in Spark RDD and/or Spark DataFrames
- Cannot find col function in pyspark
- How to pass a constant value to Python UDF?
- How to zip two array columns in Spark SQL
- Spark mllib predicting weird number or NaN
- Concatenate two PySpark dataframes
- How do I convert an array (i.e. list) column to Vector
- Filtering DataFrame using the length of a column
- Add column sum as new column in PySpark dataframe
- How to drop all columns with null values in a PySpark DataFrame?
- PySpark: compute row maximum of the subset of columns and add to an exisiting dataframe