How to flatten a nested json array?

Here’s a partial solution:

First save your data in the same directory as the script as a JSON file called data.json.

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('data.json') as json_file:
    json_data = json.load(json_file)

new_data = json_data['data']['workbooks']

result = json_normalize(new_data, ['embeddedDatasources', 'upstreamTables'], ['projectName', 'name', 'createdAt', 'updatedAt', 'owner', 'site'], record_prefix='_')

result 

Output:

_name _schema _database.name _database.connectionType projectName name createdAt updatedAt owner site
0 table_1 schema_1 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {‘name’: ‘user1’, ‘username’: ‘John’} {‘name’: ‘site1’}
1 table_2 schema_2 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {‘name’: ‘user1’, ‘username’: ‘John’} {‘name’: ‘site1’}
2 table_3 schema_3 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {‘name’: ‘user1’, ‘username’: ‘John’} {‘name’: ‘site1’}
3 table_4 schema_1 testdb redshift TestProject wkb1 2020-12-13T15:38:58Z 2020-12-13T15:38:59Z {‘name’: ‘user1’, ‘username’: ‘John’} {‘name’: ‘site1’}

What next?

I think if you re-structure the data a bit in advance (for example flattening 'database': {'name': 'testdb', 'connectionType': 'redshift'}) you will be able to add more fields to the meta parameter.

As you see in the documentation of json_normalize, the four parameters that are used here are:

  1. data: dict or list of dicts :

    • Unserialized JSON objects.
  2. record_path: str or list of str : default None

    • Path in each object to list of records. If not passed, data will be assumed to be an array of records.
  3. meta: list of paths (str or list of str) : default None

    • Fields to use as metadata for each record in resulting table.
  4. record_prefix: str : default None

    • If True, prefix records with dotted (?) path, e.g. foo.bar.field if path to records is [‘foo’, ‘bar’].

Leave a Comment