Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python?

your work function ends too soon:

In [2]: %timeit func(1)
335 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

so you are basically measuring the overhead of multiprocessing.

change your work function to do more work, like loop 1000 * 1000 times rather than 1000 times, you will see it scales again, 1000000 loops cost roughly 0.4s on my mac, which high enough compared to the overhead.

below is the test result for different n on my mac, I use Pool(4) as I have 4 cores, test runs only once rather than multi times like %timeit, cause the difference is insignificant:

speedup graph

you could see the speedup ratio is increasing proportionally with n, the overhead of multiprocessing is shared by each work function call.

the math behind, assume per-call overhead is equal:

ratio = {time_{single} \over time_{mp}} = {cost_{work} * n \over {{cost_{work} * n \over p_{cores}} + cost_{overhead} * n}} = {1 \over {\dfrac{1}{p_{cores}} + {cost_{overhead} \over cost_{work}} }}

if we want ratio > 1:

1 - {\dfrac{1}{p_{cores}} > {cost_{overhead} \over cost_{work}} }

approximately equal:

enter image description here

which means, if work function runs too fast compares with per-call overhead, multiprocessing does not scale.

Leave a Comment