Spark load data and add filename as dataframe column

You can use input_file_name which:

Creates a string column for the file name of the current Spark task.

from  pyspark.sql.functions import input_file_name

df.withColumn("filename", input_file_name())

Same thing in Scala:

import org.apache.spark.sql.functions.input_file_name

df.withColumn("filename", input_file_name)

Leave a Comment