How much overhead is there when creating a thread?

To resurrect this old thread, I just did some simple test code:

#include <thread>

int main(int argc, char** argv)
{
  for (volatile int i = 0; i < 500000; i++)
    std::thread([](){}).detach();
  return 0;
}

I compiled it with g++ test.cpp -std=c++11 -lpthread -O3 -o test. I then ran it three times in a row on an old (kernel 2.6.18) heavily loaded (doing a database rebuild) slow laptop (Intel core i5-2540M). Results from three consecutive runs: 5.647s, 5.515s, and 5.561s. So we’re looking at a tad over 10 microseconds per thread on this machine, probably much less on yours.

That’s not much overhead at all, given that serial ports max out at around 1 bit per 10 microseconds. Now, of course there’s various additional thread losses one can get involving passed/captured arguments (although function calls themselves can impose some), cache slowdowns between cores (if multiple threads on different cores are battling over the same memory at the same time), etc. But in general I highly doubt the use case you presented will adversely impact performance at all (and could provide benefits, depending), despite having you already preemptively labeled the concept “really terrible code” without even knowing how much time it takes to launch a thread.

Whether it’s a good idea or not depends a lot on the details of your situation. What else is the calling thread responsible for? What precisely is involved in preparing and writing out the packets? How frequently are they written out (with what sort of distribution? uniform, clustered, etc…?) and what’s their structure like? How many cores does the system have? Etc. Depending on the details, the optimal solution could be anywhere from “no threads at all” to “shared thread pool” to “thread for each packet”.

Note that thread pools aren’t magic and can in some cases be a slowdown versus unique threads, since one of the biggest slowdowns with threads is synchronizing cached memory used by multiple threads at the same time, and thread pools by their very nature of having to look for and process updates from a different thread have to do this. So either your primary thread or child processing thread can get stuck having to wait if the processor isn’t sure whether the other process has altered a section of memory. By contrast, in an ideal situation, a unique processing thread for a given task only has to share memory with its calling task once (when it’s launched) and then they never interfere with each other again.

Leave a Comment