Poor scaling of multiprocessing Pool.map() on a list of large objects: How to achieve better parallel scaling in python?

your work function ends too soon:

In [2]: %timeit func(1)
335 µs ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

so you are basically measuring the overhead of multiprocessing.

change your work function to do more work, like loop 1000 * 1000 times rather than 1000 times, you will see it scales again, 1000000 loops cost roughly 0.4s on my mac, which high enough compared to the overhead.

below is the test result for different n on my mac, I use Pool(4) as I have 4 cores, test runs only once rather than multi times like %timeit, cause the difference is insignificant: