Here’s a partial solution:
First save your data in the same directory as the script as a JSON file
called data.json
.
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('data.json') as json_file:
json_data = json.load(json_file)
new_data = json_data['data']['workbooks']
result = json_normalize(new_data, ['embeddedDatasources', 'upstreamTables'], ['projectName', 'name', 'createdAt', 'updatedAt', 'owner', 'site'], record_prefix='_')
result
Output:
_name | _schema | _database.name | _database.connectionType | projectName | name | createdAt | updatedAt | owner | site | |
---|---|---|---|---|---|---|---|---|---|---|
0 | table_1 | schema_1 | testdb | redshift | TestProject | wkb1 | 2020-12-13T15:38:58Z | 2020-12-13T15:38:59Z | {‘name’: ‘user1’, ‘username’: ‘John’} | {‘name’: ‘site1’} |
1 | table_2 | schema_2 | testdb | redshift | TestProject | wkb1 | 2020-12-13T15:38:58Z | 2020-12-13T15:38:59Z | {‘name’: ‘user1’, ‘username’: ‘John’} | {‘name’: ‘site1’} |
2 | table_3 | schema_3 | testdb | redshift | TestProject | wkb1 | 2020-12-13T15:38:58Z | 2020-12-13T15:38:59Z | {‘name’: ‘user1’, ‘username’: ‘John’} | {‘name’: ‘site1’} |
3 | table_4 | schema_1 | testdb | redshift | TestProject | wkb1 | 2020-12-13T15:38:58Z | 2020-12-13T15:38:59Z | {‘name’: ‘user1’, ‘username’: ‘John’} | {‘name’: ‘site1’} |
What next?
I think if you re-structure the data a bit in advance (for example flattening 'database': {'name': 'testdb', 'connectionType': 'redshift'}
) you will be able to add more fields
to the meta
parameter.
As you see in the documentation of json_normalize, the four parameters that are used here are:
-
data:
dict or list of dicts
:- Unserialized JSON objects.
-
record_path:
str or list of str
: default None- Path in each object to list of records. If not passed, data will be assumed to be an array of records.
-
meta:
list of paths (str or list of str)
: default None- Fields to use as metadata for each record in resulting table.
-
record_prefix:
str
: default None- If True, prefix records with dotted (?) path, e.g. foo.bar.field if path to records is [‘foo’, ‘bar’].