With more recent versions of Spark (as of, I believe, 1.4) this has become a lot easier. The expression sqlContext.read
gives you a DataFrameReader
instance, with a .csv()
method:
df = sqlContext.read.csv("/path/to/your.csv")
Note that you can also indicate that the csv file has a header by adding the keyword argument header=True
to the .csv()
call. A handful of other options are available, and described in the link above.