How to delete and update a record in Hive

As of Hive version 0.14.0: INSERT…VALUES, UPDATE, and DELETE are now available with full ACID support. INSERT … VALUES Syntax: INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] …)] VALUES values_row [, values_row …] Where values_row is: ( value [, value …] ) where a value is either null or any valid SQL literal UPDATE Syntax: … Read more

Sqoop import : composite primary key and textual primary key

Specify split column manually. Split column is not necessarily equal to PK. You can have complex PK and some int Split column. You can specify any integer column or even simple function (some simple function like substring or cast, not aggregation or analytic). Split column preferably should be evenly distributed integer. For example if your … Read more

How to connect to remote hive server from spark [duplicate]

JDBC is not required Spark connects directly to the Hive metastore, not through HiveServer2. To configure this, Put hive-site.xml on your classpath, and specify hive.metastore.uris to where your hive metastore hosted. Also see How to connect to a Hive metastore programmatically in SparkSQL? Import org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables. Define … Read more

How to export a Hive table into a CSV file?

or use this hive -e ‘select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv You can also specify property set hive.cli.print.header=true before the SELECT to ensure that header along with data is created and copied to file. For example: hive -e ‘set hive.cli.print.header=true; select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv If you don’t … Read more

How to optimize partitioning when migrating data from JDBC source?

Determine how many partitions you need given the amount of input data and your cluster resources. As a rule of thumb it is better to keep partition input under 1GB unless strictly necessary. and strictly smaller than the block size limit. You’ve previously stated that you migrate 1TB of data values you use in different … Read more

How to transpose/pivot data in hive?

Here is the approach i used to solved this problem using hive’s internal UDF function, “map”: select b.id, b.code, concat_ws(”,b.p) as p, concat_ws(”,b.q) as q, concat_ws(”,b.r) as r, concat_ws(”,b.t) as t from ( select id, code, collect_list(a.group_map[‘p’]) as p, collect_list(a.group_map[‘q’]) as q, collect_list(a.group_map[‘r’]) as r, collect_list(a.group_map[‘t’]) as t from ( select id, code, map(proc1,proc2) as … Read more