How do I skip a header from CSV files in Spark?

data = sc.textFile('path_to_data')
header = data.first() #extract header
data = data.filter(row => row != header)   #filter out header

More Related Contents:

Write single CSV file using spark-csv
Specifying the filename when saving a DataFrame as a CSV [duplicate]
How to bucket the range of values from a column and count how many values fall into each interval in scala?
How to store custom objects in Dataset?
How do I detect if a Spark DataFrame has a column
How to define schema for custom type in Spark SQL?
How to avoid duplicate columns after join?
Flattening Rows in Spark
Spark unionAll multiple dataframes
Spark performance for Scala vs Python
How to zip two (or more) DataFrame in Spark
How to force DataFrame evaluation in Spark
How to write spark streaming DF to Kafka topic
Stackoverflow due to long RDD Lineage
call of distinct and map together throws NPE in spark library
How to get path to the uploaded file
how to filter out a null value from spark dataframe
Is there a reason not to use SparkContext.getOrCreate when writing a spark job?
What are the various join types in Spark?
Provide schema while reading csv file as a dataframe in Scala Spark
Replace missing values with mean – Spark Dataframe
How to calculate the size of dataframe in bytes in Spark?
Why does partition parameter of SparkContext.textFile not take effect?
Spark Scala list folders in directory
Exception in thread “main” java.lang.NoSuchMethodError: scala.Product.$init$(Lscala/Product;)
Spark-Monotonically increasing id not working as expected in dataframe?
How to remove parentheses around records when saveAsTextFile on RDD[(String, Int)]?
Reading in multiple files compressed in tar.gz archive into Spark [duplicate]
How to create a Dataset of Maps?
Spark ML VectorAssembler returns strange output

More Related Contents:

Leave a Comment Cancel reply