I’d use built-in function. It should be as efficient as it gets:
import org.apache.spark.sql.functions.spark_partition_id
df.groupBy(spark_partition_id).count
More Related Contents:
- Spark – load CSV file as DataFrame?
- How to define partitioning of DataFrame?
- Encoder error while trying to map dataframe row to updated row
- Spark Scala: How to convert Dataframe[vector] to DataFrame[f1:Double, …, fn: Double)]
- How to define schema for custom type in Spark SQL?
- How can I change column types in Spark SQL’s DataFrame?
- Spark unionAll multiple dataframes
- How can I pass extra parameters to UDFs in Spark SQL?
- SparkSQL: How to deal with null values in user defined function?
- How to split a dataframe into dataframes with same column values?
- How to aggregate values into collection after groupBy?
- Dropping a nested column from Spark DataFrame
- How to define a custom aggregation function to sum a column of Vectors?
- Spark DataFrame: does groupBy after orderBy maintain that order?
- Better way to convert a string field into timestamp in Spark
- Filling gaps in timeseries Spark
- Spark UDAF with ArrayType as bufferSchema performance issues
- Apache Spark how to append new column from list/array to Spark dataframe
- How to get ID of a map task in Spark?
- How to use COGROUP for large datasets
- What are possible reasons for receiving TimeoutException: Futures timed out after [n seconds] when working with Spark [duplicate]
- Why does join fail with “java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]”?
- Create new Dataframe with empty/null field values
- How to Define Custom partitioner for Spark RDDs of equally sized partition where each partition has equal number of elements?
- DataFrame-ified zipWithIndex
- Spark: Add column to dataframe conditionally
- Apache Spark, add an “CASE WHEN … ELSE …” calculated column to an existing DataFrame
- Calculate Cosine Similarity Spark Dataframe
- Why is join not possible after show operator?
- Difference between two rows in Spark dataframe