Thanks to this inspired blog post, I’ve been able to craft a solution. This is:
-
Create a look-up table to effectively ‘iterate’ over the elements of each array. The number of rows in this table has be equal to or greater than the maximum number of elements of arrays. Let’s say this is 4 (it can be calculated using
SELECT MAX(JSON_ARRAY_LENGTH(metadata)) FROM input_table
):CREATE VIEW seq_0_to_3 AS SELECT 0 AS i UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 );
-
From this, we can create one row per JSON element:
WITH exploded_array AS ( SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERE seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECT * FROM exploded_array;
Producing:
id | json ------------------------------ 1 | {"pet":"dog"} 1 | {"country":"uk"} 2 | {"pet":"cat"} 4 | {"country":"germany"} 4 | {"education":"masters"} 4 | {"country":"belgium"}
-
However, I was needing to extract the field names/values. As I can’t see any way to extract JSON field names using Redshift’s limited functions, I’ll do this using a regular expression:
WITH exploded_array AS ( SELECT id, JSON_EXTRACT_ARRAY_ELEMENT_TEXT(metadata, seq.i) AS json FROM input_table, seq_0_to_3 AS seq WHERE seq.i < JSON_ARRAY_LENGTH(metadata) ) SELECT id, field, JSON_EXTRACT_PATH_TEXT(json, field) FROM ( SELECT id, json, REGEXP_SUBSTR(json, '[^{"]\\w+[^"]') AS field FROM exploded_array );