How do I get Python libraries in pyspark?

In the Spark context try using:

SparkContext.addPyFile("module.py")  # Also supports .zip

Quoting from the docs:

Add a .py or .zip dependency for all tasks to be executed on this
SparkContext in the future. The path passed can be either a local
file, a file in HDFS (or other Hadoop-supported filesystems), or an
HTTP, HTTPS or FTP URI.

Leave a Comment