when using nested withColumns and window functions?
Let’s say I want to do:
w1 = ...rangeBetween(-300, 0)
w2 = ...rowsBetween(-1,0)
(df.withColumn("some1", col(f.max("original1").over(w1))
.withColumn("some2", lag("some1")).over(w2)).show()
I got a lot of memory problems and high spill even with very small datasets. If I do the same using select instead of withColumn it performs way faster.
df.select(
f.max(col("original1")).over(w1).alias("some1"),
f.lag("some1")).over(w2)
).show()