Apache Spark: Get number of records per partition

I’d use built-in function. It should be as efficient as it gets:

import org.apache.spark.sql.functions.spark_partition_id

df.groupBy(spark_partition_id).count

Leave a Comment