How do I use the ‘json’ module to read in one JSON object at a time?

Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.

The following will yield complete objects as the parser finds them:

from json import JSONDecoder
from functools import partial


def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):
    buffer=""
    for chunk in iter(partial(fileobj.read, buffersize), ''):
         buffer += chunk
         while buffer:
             try:
                 result, index = decoder.raw_decode(buffer)
                 yield result
                 buffer = buffer[index:].lstrip()
             except ValueError:
                 # Not enough data to decode, read more
                 break

This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.

Use it like this:

with open('yourfilename', 'r') as infh:
    for data in json_parse(infh):
        # process object

Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.

Leave a Comment