How can I convert array to string in hive sql?

Use concat_ws(string delimiter, array<string>) function to concatenate array: select actor, concat_ws(‘,’,collect_set(date)) as grpdate from actor_table group by actor; If the date field is not string, then convert it to string: concat_ws(‘,’,collect_set(cast(date as string))) Read also this answer about alternative ways if you already have an array (of int) and do not want to explode it … Read more

Array Intersection in Spark SQL

Since Spark 2.4 array_intersect function can be used directly in SQL spark.sql( “SELECT array_intersect(array(1, 42), array(42, 3)) AS intersection” ).show() +————+ |intersection| +————+ | [42]| +————+ and Dataset API: import org.apache.spark.sql.functions.array_intersect Seq((Seq(1, 42), Seq(42, 3))) .toDF(“a”, “b”) .select(array_intersect($”a”, $”b”) as “intersection”) .show() +————+ |intersection| +————+ | [42]| +————+ Equivalent functions are also present in the … Read more

How to update table in Hive 0.13?

You can use row_number or full join. This is example using row_number: insert overwrite table_1 select customer_id, items, price, updated_date from ( select customer_id, items, price, updated_date, row_number() over(partition by customer_id order by new_flag desc) rn from ( select customer_id, items, price, updated_date, 0 as new_flag from table_1 union all select customer_id, items, price, updated_date, … Read more

Find last day of a month in Hive

As of Hive 1.1.0, last_day(string date) function is available. last_day(string date) Returns the last day of the month which the date belongs to. date is a string in the format ‘yyyy-MM-dd HH:mm:ss’ or ‘yyyy-MM-dd’. The time part of date is ignored.

Hive Explode / Lateral View multiple arrays

I found a very good solution to this problem without using any UDF, posexplode is a very good solution : SELECT COOKIE , ePRODUCT_ID, eCAT_ID, eQTY FROM TABLE LATERAL VIEW posexplode(PRODUCT_ID) ePRODUCT_IDAS seqp, ePRODUCT_ID LATERAL VIEW posexplode(CAT_ID) eCAT_ID AS seqc, eCAT_ID LATERAL VIEW posexplode(QTY) eQTY AS seqq, eDateReported WHERE seqp = seqc AND seqc = … Read more

Explode (transpose?) multiple columns in Spark SQL table

Spark >= 2.4 You can skip zip udf and use arrays_zip function: df.withColumn(“vars”, explode(arrays_zip($”varA”, $”varB”))).select( $”userId”, $”someString”, $”vars.varA”, $”vars.varB”).show Spark < 2.4 What you want is not possible without a custom UDF. In Scala you could do something like this: val data = sc.parallelize(Seq( “””{“userId”: 1, “someString”: “example1”, “varA”: [0, 2, 5], “varB”: [1, 2, … Read more