hive - w3toppers.com

How to skip CSV header in Hive External Table?

As of Hive v0.13.0, you can use skip.header.line.count table property: create external table testtable (name string, message string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/testtable’ TBLPROPERTIES (“skip.header.line.count”=”1”); Use ALTER TABLE for an existing table: ALTER TABLE tablename SET TBLPROPERTIES (“skip.header.line.count”=”1”); Please note that while it works it comes with … Read more

What is Hive: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

That’s not the real error, here’s how to find it: Go to the hadoop jobtracker web-dashboard, find the hive mapreduce jobs that failed and look at the logs of the failed tasks. That will show you the real error. The console output errors are useless, largely beause it doesn’t have a view of the individual … Read more

Difference between Pig and Hive? Why have both? [closed]

Check out this post from Alan Gates, Pig architect at Yahoo!, that compares when would use a SQL like Hive rather than Pig. He makes a very convincing case as to the usefulness of a procedural language like Pig (vs. declarative SQL) and its utility to dataflow designers.

Job queue for Hive action in oozie

A. Oozie specifics Oozie propagates the “regular” Hadoop properties to a “regular” MapReduce Action. But for other types of Action (Shell, Hive, Java, etc.) where Oozie runs a single Mapper task in YARN, it does not consider that it’s a real MapReduce job. Hence it uses a different set of undocumented properties always prefixed with … Read more

Create HIVE Table with multi character delimiter

FILELDS TERMINATED BY does not support multi-character delimiters. The easiest way to do this is to use RegexSerDe: CREATE EXTERNAL TABlE tableex(id INT, name STRING) ROW FORMAT ‘org.apache.hadoop.hive.contrib.serde2.RegexSerDe’ WITH SERDEPROPERTIES ( “input.regex” = “^(\\d+)~\\*(.*)$” ) STORED AS TEXTFILE LOCATION ‘/user/myusername’;

Export as csv in beeline hive

When hive version is at least 0.11.0 you can execute: INSERT OVERWRITE LOCAL DIRECTORY ‘/tmp/directoryWhereToStoreData’ ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY “\n” SELECT * FROM yourTable; from hive/beeline to store the table into a directory on the local filesystem. Alternatively, with beeline, save your SELECT query in yourSQLFile.sql and run: beeline … Read more

Overwrite only some partitions in a partitioned spark Dataset

Since Spark 2.3.0 this is an option when overwriting a table. To overwrite it, you need to set the new spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite. Example in scala: spark.conf.set( “spark.sql.sources.partitionOverwriteMode”, “dynamic” ) data.write.mode(“overwrite”).insertInto(“partitioned_table”) I recommend doing a repartition based on your partition column before writing, … Read more

When creating an external table in hive can I point the location to specific files in a directory?

What kmosley said is true. As of now, you can’t selectively choose certain files to be a part of your Hive table. However, there are 2 ways to get around it. Option 1: You can move all the csv files into another HDFS directory and create a Hive table on top of that. If it … Read more

java.lang.RuntimeException:Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

Looks like problem with your metastore. If you are using the default hive metastore embedded derby. Lock file would be there in case of abnormal exit. if you remove that lock file this issue would be solved rm metastore_db/*.lck

Loading Data from a .txt file to Table Stored as ORC in Hive

LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while loading data into tables. So, in this case the input file /home/user/test_details.txt needs to be in ORC format if you are loading it into an ORC table. A possible workaround is to create a temporary table with STORED AS … Read more