Use window function:
from pyspark.sql.window import *
from pyspark.sql.functions import row_number
df.withColumn("row_num", row_number().over(Window.partitionBy("Group").orderBy("Date")))
More Related Contents:
- Finding duplicates from large data set using Apache Spark
- How to melt Spark DataFrame?
- Using a column value as a parameter to a spark DataFrame function
- Find maximum row per group in Spark DataFrame
- Unpivot in spark-sql/pyspark
- Split Spark Dataframe string column into multiple columns
- Avoid performance impact of a single partition mode in Spark window functions
- How to access element of a VectorUDT column in a Spark DataFrame?
- Spark Dataframe validating column names for parquet writes
- How to check if spark dataframe is empty?
- How to split a list to multiple columns in Pyspark?
- pyspark dataframe filter or include based on list
- How to save/insert each DStream into a permanent table
- Pyspark : forward fill with last observation for a DataFrame
- Spark load data and add filename as dataframe column
- PySpark: how to resample frequencies
- How to loop through each row of dataFrame in pyspark
- Fill in null with previously known good value with pyspark
- Why does Spark think this is a cross / Cartesian join
- How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?
- Rename more than one column using withColumnRenamed
- Keep only duplicates from a DataFrame regarding some field
- pyspark: count distinct over a window
- How can I access python variable in Spark SQL?
- PySpark: How to fillna values in dataframe for specific columns?
- How to exclude multiple columns in Spark dataframe in Python
- Stratified sampling with pyspark
- How to turn off scientific notation in pyspark?
- spark.ml StringIndexer throws ‘Unseen label’ on fit()
- PySpark error: AttributeError: ‘NoneType’ object has no attribute ‘_jvm’