FastAPI runs api-calls in serial instead of parallel fashion

As per FastAPI’s documentation:

When you declare a path operation function with normal def instead
of async def, it is run in an external threadpool that is then
awaited, instead of being called directly (as it would block the
server).

Thus, def (sync) routes run in a separate thread from a threadpool, or, in other words, the server processes the requests concurrently, whereas async def routes run on the main (single) thread, i.e., the server processes the requests sequentially – as long as there is no await call to I/O-bound operations inside such routes, such as waiting for data from the client to be sent through the network, contents of a file in the disk to be read, a database operation to finish, etc. – have a look here. Asynchronous code with async and await is many times summarised as using coroutines. Coroutines are collaborative (or cooperatively multitasked): “at any given time, a program with coroutines is running only one of its coroutines, and this running coroutine suspends its execution only when it explicitly requests to be suspended” (see here and here for more info on coroutines). However, this does not apply to CPU-bound operations, such as the ones described here (e.g., audio or image processing, machine learning). CPU-bound operations, even if declared in async def functions and called using await, will block the main thread. This also means that a blocking operation, such as time.sleep(), in an async def route will block the entire server (as in your case).

Thus, if your function is not going to make any async calls, you could declare it with def instead, as shown below:

@app.get("/ping")
def ping(request: Request):
    #print(request.client)
    print("Hello")
    time.sleep(5)
    print("bye")
    return "pong"

Otherwise, if you are going to call async functions that you have to await, you should use async def. To demonstrate this, the below uses asyncio.sleep() function from the asyncio library. Similar example is given here and here as well.

import asyncio
 
@app.get("/ping")
async def ping(request: Request):
    print("Hello")
    await asyncio.sleep(5)
    print("bye")
    return "pong"

Both the functions above will print the expected output – as mentioned in your question – if two requests arrive at around the same time.

Hello
Hello
bye
bye

Note: When you call your endpoint for the second (third, and so on) time, please remember to do that from a tab that is isolated from the browser’s main session; otherwise, the requests will be shown as coming from the same client (you could check that using print(request.client) – the port number would appear being the same, if both tabs were opened in the same window), and hence, the requests would be processed sequentially. You could either reload the same tab (as is running), or open a new tab in an incognito window, or use another browser/client to send the request.

Async/await and Expensive CPU-bound Operations (Long Computation Tasks)

If you are required to use async def (as you might need to await for coroutines inside your route), but also have some synchronous long computation task that might be blocking the server and doesn’t let other requests to go through, for example:

@app.post("/ping")
async def ping(file: UploadFile = File(...)):
    print("Hello")
    try:
        contents = await file.read()
        res = some_long_computation_task(contents)  # this blocks other requests
    finally:
        await file.close()
    print("bye")
    return "pong"

then:

  1. Use more workers (e.g., uvicorn main:app --workers 4). Note: Each worker “has its own things, variables and memory”. This means that global variables/objects, etc., won’t be shared across the processes/workers. In this case, you should consider using a database storage, or Key-Value stores (Caches), as described here and here. Additionally, “if you are consuming a large amount of memory in your code, each process will consume an equivalent amount of memory”.

  2. Use FastAPI’s (Starlette’s) run_in_threadpool() from concurrency module (source code here and here) – as @tiangolo suggested here – which “will run the function in a separate thread to ensure that the main thread (where coroutines are run) does not get blocked” (see here). As described by @tiangolo here, “run_in_threadpool is an awaitable function, the first parameter is a normal function, the next parameters are passed to that function directly. It supports sequence arguments and keyword arguments”.

    from fastapi.concurrency import run_in_threadpool
    res = await run_in_threadpool(some_long_computation_task, contents)
    
  3. Alternatively, use asyncio‘s run_in_executor:

    loop = asyncio.get_running_loop()
    res = await loop.run_in_executor(None, lambda: some_long_computation_task(contents))
    
  4. You should also check whether you could change your route’s definition to def. For example, if the only method in your endpoint that has to be awaited is the one reading the file contents (as you mentioned in the comments section below), FastAPI can read the bytes of a file for you (however, this should work for small files, as the whole contents will be stored in memory, see here), or you could even call the read() method of the SpooledTemporaryFile object directly, so that you don’t have to await the read() method – and since you can now declare your route with def, each request will run in a separate thread.

    @app.post("/ping")
    def ping(file: UploadFile = File(...)):
        print("Hello")
        try:
            contents = file.file.read()
            res = some_long_computation_task(contents)
        finally:
            file.file.close()
        print("bye")
        return "pong"
    
  5. Have a look at this answer, as well as the documentation here, for more suggested solutions.

Leave a Comment