Spark gives a StackOverflowError when training using ALS

The solution was to add checkpointing, which prevents the recursion used by the codebase from creating an overflow. First, create a new directory to store the checkpoints. Then, have your SparkContext use that directory for checkpointing. Here is the example in Python:

sc.setCheckpointDir('checkpoint/')

You may also need to add checkpointing to the ALS as well, but I haven’t been able to determine whether that makes a difference. To add a checkpoint there (probably not necessary), just do:

ALS.checkpointInterval = 2

Leave a Comment