Using the multiprocessing module for cluster computing

If by cluster computing you mean distributed memory systems (multiple nodes rather that SMP) then Python’s multiprocessing may not be a suitable choice. It can spawn multiple processes but they will still be bound within a single node.

What you will need is a framework that handles spawing of processes across multiple nodes and provides a mechanism for communication between the processors. (pretty much what MPI does).

See the page on Parallel Processing on the Python wiki for a list of frameworks which will help with cluster computing.

From the list, pp, jug, pyro and celery look like sensible options although I can’t personally vouch for any since I have no experience with any of them (I use mainly MPI).

If ease of installation/use is important, I would start by exploring jug. It’s easy to install, supports common batch cluster systems, and looks well documented.

Leave a Comment