Specifying the filename when saving a DataFrame as a CSV [duplicate]

It’s not possible to do it directly in Spark’s save

Spark uses Hadoop File Format, which requires data to be partitioned – that’s why you have part- files. You can easily change filename after processing just like in this question

In Scala it will look like:

import org.apache.hadoop.fs._
val fs = FileSystem.get(sc.hadoopConfiguration)
val file = fs.globStatus(new Path("path/file.csv/part*"))(0).getPath().getName()

fs.rename(new Path("csvDirectory/" + file), new Path("mydata.csv"))
fs.delete(new Path("mydata.csv-temp"), true)

or just:

import org.apache.hadoop.fs._
val fs = FileSystem.get(sc.hadoopConfiguration)
fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv"))

Edit: As mentioned in comments, you can also write your own OutputFormat, please see documents for information about this approach to set file name

Leave a Comment