How to skip CSV header in Hive External Table?

As of Hive v0.13.0, you can use skip.header.line.count table property:

create external table testtable (name string, message string)
row format delimited 
fields terminated by '\t' 
lines terminated by '\n' 
location '/testtable'
TBLPROPERTIES ("skip.header.line.count"="1");

Use ALTER TABLE for an existing table:

ALTER TABLE tablename
SET TBLPROPERTIES ("skip.header.line.count"="1");

Please note that while it works it comes with its own issues. When there is more than one output file generated i.e. reducers are greater than 1, it skips the first record for each and every file which might not necessarily be the desired behaviour.

Leave a Comment