Spark gives a StackOverflowError when training using ALS

The solution was to add checkpointing, which prevents the recursion used by the codebase from creating an overflow. First, create a new directory to store the checkpoints. Then, have your SparkContext use that directory for checkpointing. Here is the example in Python:

sc.setCheckpointDir('checkpoint/')

You may also need to add checkpointing to the ALS as well, but I haven’t been able to determine whether that makes a difference. To add a checkpoint there (probably not necessary), just do:

ALS.checkpointInterval = 2

More Related Contents:

How to melt Spark DataFrame?
Using a column value as a parameter to a spark DataFrame function
Unpivot in spark-sql/pyspark
java.lang.IllegalArgumentException at org.apache.xbean.asm5.ClassReader.(Unknown Source) with Java 10
Split Spark Dataframe string column into multiple columns
Avoid performance impact of a single partition mode in Spark window functions
How to check if spark dataframe is empty?
How to split a list to multiple columns in Pyspark?
pyspark dataframe filter or include based on list
How to fix ‘TypeError: an integer is required (got type bytes)’ error when trying to run pyspark after installing spark 2.4.4
Adding a group count column to a PySpark dataframe
Efficient pyspark join
Multiple Spark applications with HiveContext
How to loop through each row of dataFrame in pyspark
Fill in null with previously known good value with pyspark
Apache Spark: What is the equivalent implementation of RDD.groupByKey() using RDD.aggregateByKey()?
PySpark – get row number for each row in a group
Save ML model for future usage
Pyspark: Pass multiple columns in UDF
Keep only duplicates from a DataFrame regarding some field
reading json file in pyspark
How can I access python variable in Spark SQL?
pyspark: rolling average using timeseries data
How to exclude multiple columns in Spark dataframe in Python
Stratified sampling with pyspark
Get CSV to Spark dataframe
How to turn off scientific notation in pyspark?
Spark ALS predictAll returns empty
Why is the fold action necessary in Spark?
How to add a SparkListener from pySpark in Python?

More Related Contents:

Leave a Comment Cancel reply