The biggest advantage of nonblocking or asynchronous I/O is that your thread can continue its work in parallel. Of course you can achieve this also using an additional thread. As you stated for best overall (system) performance I guess it would be better to use asynchronous I/O and not multiple threads (so reducing thread switching).
Let’s look at possible implementations of a network server program that shall handle 1000 clients connected in parallel:
- One thread per connection (can be blocking I/O, but can also be non-blocking I/O).
Each thread requires memory resources (also kernel memory!), that is a disadvantage. And every additional thread means more work for the scheduler. - One thread for all connections.
This takes load from the system because we have fewer threads. But it also prevents you from using the full performance of your machine, because you might end up driving one processor to 100% and letting all other processors idle around. - A few threads where each thread handles some of the connections.
This takes load from the system because there are fewer threads. And it can use all available processors. On Windows this approach is supported by Thread Pool API.
Of course having more threads is not per se a problem. As you might have recognized I chose quite a high number of connections/threads. I doubt that you’ll see any difference between the three possible implementations if we are talking about only a dozen threads (this is also what Raymond Chen suggests on the MSDN blog post Does Windows have a limit of 2000 threads per process?).
On Windows using unbuffered file I/O means that writes must be of a size which is a multiple of the page size. I have not tested it, but it sounds like this could also affect write performance positively for buffered synchronous and asynchronous writes.
The steps 1 to 7 you describe give a good idea of how it works. On Windows the operating system will inform you about completion of an asynchronous I/O (WriteFile
with OVERLAPPED
structure) using an event or a callback. Callback functions will only be called for example when your code calls WaitForMultipleObjectsEx
with bAlertable
set to true
.
Some more reading on the web:
- Multiple Threads in the User Interface on MSDN, also shortly handling the cost of creating threads
- Section Threads and Thread Pools says “Although threads are relatively easy to create and use, the operating system allocates a significant amount of time and other resources to manage them.”
- CreateThread documentation on MSDN says “However, your application will have better performance if you create one thread per processor and build queues of requests for which the application maintains the context information.”.
- Old article Why Too Many Threads Hurts Performance, and What to do About It