How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

The solution to your problem is to use Spark Excel dependency in your project.

Spark Excel has flexible options to play with.

I have tested the following code to read from excel and convert it to dataframe and it just works perfect

def readExcel(file: String): DataFrame = sqlContext.read
    .format("com.crealytics.spark.excel")
    .option("location", file)
    .option("useHeader", "true")
    .option("treatEmptyValuesAsNulls", "true")
    .option("inferSchema", "true")
    .option("addColorColumns", "False")
    .load()

val data = readExcel("path to your excel file")

data.show(false)

you can give sheetname as option if your excel sheet has multiple sheets

.option("sheetName", "Sheet2")

I hope its helpful

Leave a Comment