FastAPI – How to get the response body in Middleware

The response body is an iterator, which once it has been iterated through, it cannot be re-iterated again. Thus, you either have to save all the iterated data to a list (or bytes variable) and use that to return a custom Response, or initiate the iterator again. The options below demonstrate both approaches. In case you would like to get the request body inside the middleware as well, please have a look at this answer.

Option 1

Save the data to a list and use iterate_in_threadpool to initiate the iterator again, as described here – which is what StreamingResponse uses, as shown here.

from starlette.concurrency import iterate_in_threadpool

@app.middleware("http")
async def some_middleware(request: Request, call_next):
    response = await call_next(request)
    response_body = [chunk async for chunk in response.body_iterator]
    response.body_iterator = iterate_in_threadpool(iter(response_body))
    print(f"response_body={response_body[0].decode()}")
    return response

Note 1: If your code uses StreamingResponse, response_body[0] would return only the first chunk of the response. To get the entire response body, you should join that list of bytes (chunks), as shown below (.decode() returns a string representation of the bytes object):

print(f"response_body={(b''.join(response_body)).decode()}")

Note 2: If you have a StreamingResponse streaming a body that wouldn’t fit into your server’s RAM (for example, a response of 30GB), you may run into memory errors when iterating over the response.body_iterator (this applies to both options listed in this answer), unless you loop through response.body_iterator (as shown in Option 2), but instead of storing the chunks in an in-memory variable, you store it somewhere on the disk. However, you would then need to retrieve the entire response data from that disk location and load it into RAM, in order to send it back to the client (which could extend the delay in responding to the client even more)—in that case, you could load the contents into RAM in chunks and use StreamingResponse, similar to what has been demonstrated here, here, as well as here, here and here (in Option 1, you can just pass your iterator/generator function to iterate_in_threadpool). However, I would not suggest following that approach, but instead have such endpoints returning large streaming responses excluded from the middleware, as described in this answer.

Option 2

The below demosntrates another approach, where the response body is stored in a bytes object (instead of a list, as shown above), and is used to return a custom Response directly (along with the status_code, headers and media_type of the original response).

@app.middleware("http")
async def some_middleware(request: Request, call_next):
    response = await call_next(request)
    response_body = b""
    async for chunk in response.body_iterator:
        response_body += chunk
    print(f"response_body={response_body.decode()}")
    return Response(content=response_body, status_code=response.status_code, 
        headers=dict(response.headers), media_type=response.media_type)

Leave a Comment