Update , SET option in Hive

INSERT OVERWRITE TABLE _tableName_ PARTITION (_partitionColumn_= _partitionValue_) SELECT [other Things], CASE WHEN id=206 THEN ‘florida’ ELSE location END AS location, [other Other Things] FROM _tableName_ WHERE [_whereClause_]; You can have multiple partitions listed by separating them by commas. … PARTITION (_partitionColumn_= _partitionValue1_, _partitionColumn_= _partitionValue2_, …). I haven’t done this with multiple partitions, just one at … Read more

How to update partition metadata in Hive , when partition data is manualy deleted from HDFS

EDIT : Starting with Hive 3.0.0 MSCK can now discover new partitions or remove missing partitions (or both) using the following syntax : MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS] This was implemented in HIVE-17824 As correctly stated by HakkiBuyukcengiz, MSCK REPAIR doesn’t remove partitions if the corresponding folder on HDFS was manually deleted, it only … Read more

How to skip CSV header in Hive External Table?

As of Hive v0.13.0, you can use skip.header.line.count table property: create external table testtable (name string, message string) row format delimited fields terminated by ‘\t’ lines terminated by ‘\n’ location ‘/testtable’ TBLPROPERTIES (“skip.header.line.count”=”1”); Use ALTER TABLE for an existing table: ALTER TABLE tablename SET TBLPROPERTIES (“skip.header.line.count”=”1”); Please note that while it works it comes with … Read more

Hive Explode / Lateral View multiple arrays

I found a very good solution to this problem without using any UDF, posexplode is a very good solution : SELECT COOKIE , ePRODUCT_ID, eCAT_ID, eQTY FROM TABLE LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported WHERE seqp = seqc AND seqc = … Read more

Create Table in Hive with one file

There are many possible solutions: 1) Add distribute by partition key at the end of your query. Maybe there are many partitions per reducer and each reducer creates files for each partition. This may reduce the number of files and memory consumption as well. hive.exec.reducers.bytes.per.reducer setting will define how much data each reducer will process. … Read more

How to set variables in HIVE scripts

You need to use the special hiveconf for variable substitution. e.g. hive> set CURRENT_DATE=’2012-09-16′; hive> select * from foo where day >= ${hiveconf:CURRENT_DATE} similarly, you could pass on command line: % hive -hiveconf CURRENT_DATE=’2012-09-16′ -f test.hql Note that there are env and system variables as well, so you can reference ${env:USER} for example. To see … Read more