Spark-Obtaining file name in RDDs
Since Spark 1.6 you can combine text data source and input_file_name function as follows: Scala: import org.apache.spark.sql.functions.input_file_name val inputPath: String = ??? spark.read.text(inputPath) .select(input_file_name, $”value”) .as[(String, String)] // Optionally convert to Dataset .rdd // or RDD Python: (Versions before 2.x are buggy and may not preserve names when converted to RDD): from pyspark.sql.functions import input_file_name … Read more