Pyspark add sequential and deterministic index to dataframe

What I mean is: how can I add a column with an ordered, monotonically increasing by 1 sequence 0:df.count? (from comments) You can use row_number() here, but for that you’d need to specify an orderBy(). Since you don’t have an ordering column, just use monotonically_increasing_id(). from pyspark.sql.functions import row_number, monotonically_increasing_id from pyspark.sql import Window df … Read more