Python multiprocessing for parallel processes

You are correct, they are executing sequentially in your example.

p.join() causes the current thread to block until it is finished executing. You’ll either want to join your processes individually outside of your for loop (e.g., by storing them in a list and then iterating over it) or use something like numpy.Pool and apply_async with a callback. That will also let you add it to your results directly rather than keeping the objects around.

For example:

def f(i):  
    return i*np.identity(4)

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))
    def adder(value):
        global result
        result += value

    for i in range(30):
        p.apply_async(f, args=(i,), callback=adder)
    p.close()
    p.join()
    print result

Closing and then joining the pool at the end ensures that the pool’s processes have completed and the result object is finished being computed. You could also investigate using Pool.imap as a solution to your problem. That particular solution would look something like this:

if __name__ == '__main__':
    p=Pool(5)
    result = np.zeros((4,4))

    im = p.imap_unordered(f, range(30), chunksize=5)

    for x in im:
        result += x

    print result

This is cleaner for your specific situation, but may not be for whatever you are ultimately trying to do.

As to storing all of your varied results, if I understand your question, you can just add it off into a result in the callback method (as above) or item-at-a-time using imap/imap_unordered (which still stores the results, but you’ll clear it as it builds). Then it doesn’t need to be stored for longer than it takes to add to the result.

Leave a Comment