hive - w3toppers.com

How to delete and update a record in Hive

As of Hive version 0.14.0: INSERT…VALUES, UPDATE, and DELETE are now available with full ACID support. INSERT … VALUES Syntax: INSERT INTO TABLE tablename [PARTITION (partcol1[=val1], partcol2[=val2] …)] VALUES values_row [, values_row …] Where values_row is: ( value [, value …] ) where a value is either null or any valid SQL literal UPDATE Syntax: … Read more

How to create n number of external tables with a single hdfs path using Hive

It is possible to create many tables (both managed and external at the same time) on top of the same location in HDFS. Creating tables with exactly the same schema on top of the same data is not useful at all, but you can create different tables with different number of columns for example or … Read more

Sqoop import : composite primary key and textual primary key

Specify split column manually. Split column is not necessarily equal to PK. You can have complex PK and some int Split column. You can specify any integer column or even simple function (some simple function like substring or cast, not aggregation or analytic). Split column preferably should be evenly distributed integer. For example if your … Read more

How to connect to remote hive server from spark [duplicate]

JDBC is not required Spark connects directly to the Hive metastore, not through HiveServer2. To configure this, Put hive-site.xml on your classpath, and specify hive.metastore.uris to where your hive metastore hosted. Also see How to connect to a Hive metastore programmatically in SparkSQL? Import org.apache.spark.sql.hive.HiveContext, as it can perform SQL query over Hive tables. Define … Read more

How to export a Hive table into a CSV file?

or use this hive -e ‘select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv You can also specify property set hive.cli.print.header=true before the SELECT to ensure that header along with data is created and copied to file. For example: hive -e ‘set hive.cli.print.header=true; select * from your_Table’ | sed ‘s/[\t]/,/g’ > /home/yourfile.csv If you don’t … Read more

How to save DataFrame directly to Hive?

You can create an in-memory temporary table and store them in hive table using sqlContext. Lets say your data frame is myDf. You can create one temporary table using, myDf.createOrReplaceTempView(“mytempTable”) Then you can use a simple hive statement to create table and dump the data from your temp table. sqlContext.sql(“create table mytable as select * … Read more

How do I output the results of a HiveQL query to CSV?

Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I’ll describe the method I use to get tsv files from Hive tables. According to the manual, your query will store … Read more

How to optimize partitioning when migrating data from JDBC source?

Determine how many partitions you need given the amount of input data and your cluster resources. As a rule of thumb it is better to keep partition input under 1GB unless strictly necessary. and strictly smaller than the block size limit. You’ve previously stated that you migrate 1TB of data values you use in different … Read more

How to transpose/pivot data in hive?

Here is the approach i used to solved this problem using hive’s internal UDF function, “map”: select b.id, b.code, concat_ws(”,b.p) as p, concat_ws(”,b.q) as q, concat_ws(”,b.r) as r, concat_ws(”,b.t) as t from ( select id, code, collect_list(a.group_map[‘p’]) as p, collect_list(a.group_map[‘q’]) as q, collect_list(a.group_map[‘r’]) as r, collect_list(a.group_map[‘t’]) as t from ( select id, code, map(proc1,proc2) as … Read more

Difference between Hive internal tables and external tables?

Hive has a relational database on the master node it uses to keep track of state. For instance, when you CREATE TABLE FOO(foo string) LOCATION ‘hdfs://tmp/’;, this table schema is stored in the database. If you have a partitioned table, the partitions are stored in the database(this allows hive to use lists of partitions without … Read more