Functions that help to understand json(dict) structure

Here are a family of recursive generators that can be used to search through an object composed of dicts and lists. find_key yields a tuple containing a list of the dictionary keys and list indices that lead to the key that you pass in; the tuple also contains the value associated with that key. Because it’s a generator it will find all matching keys if the object contains multiple matching keys, if desired.

def find_key(obj, key):
    if isinstance(obj, dict):
        yield from iter_dict(obj, key, [])
    elif isinstance(obj, list):
        yield from iter_list(obj, key, [])

def iter_dict(d, key, indices):
    for k, v in d.items():
        if k == key:
            yield indices + [k], v
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

def iter_list(seq, key, indices):
    for k, v in enumerate(seq):
        if isinstance(v, dict):
            yield from iter_dict(v, key, indices + [k])
        elif isinstance(v, list):
            yield from iter_list(v, key, indices + [k])

# test

data = {
    '1_data': {
        '4_data': [
            {'5_data': 'hooray'},
            {'3_data': 'hooray2'}
        ], 
        '2_data': []
    }
}

for t in find_key(data, '3_data'):
    print(t)

output

(['1_data', '4_data', 1, '3_data'], 'hooray2')

To get a single key list you can pass find_key to the next function. And if you want to use a key list to fetch the associated value you can use a simple for loop.

seq, val = next(find_key(data, '3_data'))
print('seq:', seq, 'val:', val)

obj = data
for k in seq:
    obj = obj[k]
print('obj:', obj, obj == val)

output

seq: ['1_data', '4_data', 1, '3_data'] val: hooray2
obj: hooray2 True

If the key may be missing, then give next an appropriate default tuple. Eg:

seq, val = next(find_key(data, '6_data'), ([], None))
print('seq:', seq, 'val:', val)
if seq:
    obj = data
    for k in seq:
        obj = obj[k]
    print('obj:', obj, obj == val)

output

seq: [] val: None

Note that this code is for Python 3. To run it on Python 2 you need to replace all the yield from statements, eg replace

yield from iter_dict(obj, key, [])

with

for u in iter_dict(obj, key, []):
    yield u

How it works

To understand how this code works you need to be familiar with recursion and with Python generators. You may also find this page helpful: Understanding Generators in Python; there are also various Python generators tutorials available online.

The Python object returned by json.load or json.loads is generally a dict, but it can also be a list. We pass that object to the find_key generator as the obj arg, along with the key string that we want to locate. find_key then calls either iter_dict or iter_list, as appropriate, passing them the object, the key, and an empty list indices, which is used to collect the dict keys and list indices that lead to the key we want.

iter_dict iterates over each (k, v) pair at the top level of its d dict arg. If k matches the key we’re looking for then the current indices list is yielded with k appended to it, along with the associated value. Because iter_dict is recursive the yielded (indices list, value) pairs get passed up to the previous level of recursion, eventually making their way up to find_key and then to the code that called find_key. Note that this is the “base case” of our recursion: it’s the part of the code that determines whether this recursion path leads to the key we want. If a recursion path never finds a key matching the key we’re looking for then that recursion path won’t add anything to indices and it will terminate without yielding anything.

If the current v is a dict, then we need to examine all the (key, value) pairs it contains. We do that by making a recursive call to iter_dict, passing that v is its starting object and the current indices list. If the current v is a list we instead call iter_list, passing it the same args.

iter_list works similarly to iter_dict except that a list doesn’t have any keys, it only contains values, so we don’t perform the k == key test, we just recurse into any dicts or lists that the original list contains.

The end result of this process is that when we iterate over find_key we get pairs of (indices, value) where each indices list is the sequence of dict keys and list indices that succesfully terminate in a dict item with our desired key, and value is the value associated with that particular key.

If you’d like to see some other examples of this code in use please see how to modify the key of a nested Json and How can I select deeply nested key:values from dictionary in python.

Also take look at my new, more streamlined show_indices function.

Leave a Comment