How is the fork/join framework better than a thread pool?

I think the basic misunderstanding is, that the Fork/Join examples do NOT show work stealing but only some kind of standard divide and conquer.

Work stealing would be like this: Worker B has finished his work. He is a kind one, so he looks around and sees Worker A still working very hard. He strolls over and asks: “Hey lad, I could give you a hand.” A replies. “Cool, I have this task of 1000 units. So far I have finished 345 leaving 655. Could you please work on number 673 to 1000, I’ll do the 346 to 672.” B says “OK, let’s start so we can go to the pub earlier.”

You see – the workers must communicate between each other even when they started the real work. This is the missing part in the examples.

The examples on the other hand show only something like “use subcontractors”:

Worker A: “Dang, I have 1000 units of work. Too much for me. I’ll do 500 myself and subcontract 500 to someone else.” This goes on until the big task is broken down into small packets of 10 units each. These will be executed by the available workers. But if one packet is a kind of poison pill and takes considerably longer than other packets — bad luck, the divide phase is over.

The only remaining difference between Fork/Join and splitting the task upfront is this: When splitting upfront you have the work queue full right from start. Example: 1000 units, the threshold is 10, so the queue has 100 entries. These packets are distributed to the threadpool members.

Fork/Join is more complex and tries to keep the number of packets in the queue smaller:

  • Step 1: Put one packet containing (1…1000) into queue
  • Step 2: One worker pops the packet(1…1000) and replaces it with two packets: (1…500) and (501…1000).
  • Step 3: One worker pops packet (500…1000) and pushes (500…750) and (751…1000).
  • Step n: The stack contains these packets: (1..500), (500…750), (750…875)… (991..1000)
  • Step n+1: Packet (991..1000) is popped and executed
  • Step n+2: Packet (981..990) is popped and executed
  • Step n+3: Packet (961..980) is popped and split into (961…970) and (971..980).
    ….

You see: in Fork/Join the queue is smaller (6 in the example) and the “split” and “work” phases are interleaved.

When multiple workers are popping and pushing simultaneously the interactions are not so clear of course.

Leave a Comment