PySpark DataFrame Column Reference: df.col vs. df[‘col’] vs. F.col(‘col’)?
In most practical applictions, there is almost no difference. However, they are implemented by calls to different underlying functions (source) and thus are not exactly the same. We can illustrate with a small example: df = spark.createDataFrame( [(1,’a’, 0), (2,’b’,None), (None,’c’,3)], [‘col’, ‘2col’, ‘third col’] ) df.show() #+—-+—-+———+ #| col|2col|third col| #+—-+—-+———+ #| 1| a| … Read more