Spark – load CSV file as DataFrame?

spark-csv is part of core Spark functionality and doesn’t require a separate library. So you could just do for example df = spark.read.format(“csv”).option(“header”, “true”).load(“csvfile.csv”) In scala,(this works for any format-in delimiter mention “,” for csv, “\t” for tsv etc) val df = sqlContext.read.format(“com.databricks.spark.csv”) .option(“delimiter”, “,”) .load(“csvfile.csv”)

Hadoop “Unable to load native-hadoop library for your platform” warning

I assume you’re running Hadoop on 64bit CentOS. The reason you saw that warning is the native Hadoop library $HADOOP_HOME/lib/native/libhadoop.so.1.0.0 was actually compiled on 32 bit. Anyway, it’s just a warning, and won’t impact Hadoop’s functionalities. Here is the way if you do want to eliminate this warning, download the source code of Hadoop and … Read more