multiprocessing.Lock
is implemented using a Semaphore object provided by the OS. On Linux, the child just inherits a handle to the Semaphore from the parent via os.fork
. This isn’t a copy of the semaphore; it’s actually inheriting the same handle the parent has, the same way file descriptors can be inherited. Windows on the other hand, doesn’t support os.fork
, so it has to pickle the Lock
. It does this by creating a duplicate handle to the Windows Semaphore used internally by the multiprocessing.Lock
object, using the Windows DuplicateHandle
API, which states:
The duplicate handle refers to the same object as the original handle.
Therefore, any changes to the object are reflected through both
handles
The DuplicateHandle
API allows you to give ownership of the duplicated handle to the child process, so that the child process can actually use it after unpickling it. By creating a duplicated handle owned by the child, you can effectively “share” the lock object.
Here’s the semaphore object in multiprocessing/synchronize.py
class SemLock(object):
def __init__(self, kind, value, maxvalue):
sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
debug('created semlock with handle %s' % sl.handle)
self._make_methods()
if sys.platform != 'win32':
def _after_fork(obj):
obj._semlock._after_fork()
register_after_fork(self, _after_fork)
def _make_methods(self):
self.acquire = self._semlock.acquire
self.release = self._semlock.release
self.__enter__ = self._semlock.__enter__
self.__exit__ = self._semlock.__exit__
def __getstate__(self): # This is called when you try to pickle the `Lock`.
assert_spawning(self)
sl = self._semlock
return (Popen.duplicate_for_child(sl.handle), sl.kind, sl.maxvalue)
def __setstate__(self, state): # This is called when unpickling a `Lock`
self._semlock = _multiprocessing.SemLock._rebuild(*state)
debug('recreated blocker with handle %r' % state[0])
self._make_methods()
Note the assert_spawning
call in __getstate__
, which gets called when pickling the object. Here’s how that is implemented:
#
# Check that the current thread is spawning a child process
#
def assert_spawning(self):
if not Popen.thread_is_spawning():
raise RuntimeError(
'%s objects should only be shared between processes'
' through inheritance' % type(self).__name__
)
That function is the one that makes sure you’re “inheriting” the Lock
, by calling thread_is_spawning
. On Linux, that method just returns False
:
@staticmethod
def thread_is_spawning():
return False
This is because Linux doesn’t need to pickle to inherit Lock
, so if __getstate__
is actually being called on Linux, we must not be inheriting. On Windows, there’s more going on:
def dump(obj, file, protocol=None):
ForkingPickler(file, protocol).dump(obj)
class Popen(object):
'''
Start a subprocess to run the code of a process object
'''
_tls = thread._local()
def __init__(self, process_obj):
...
# send information to child
prep_data = get_preparation_data(process_obj._name)
to_child = os.fdopen(wfd, 'wb')
Popen._tls.process_handle = int(hp)
try:
dump(prep_data, to_child, HIGHEST_PROTOCOL)
dump(process_obj, to_child, HIGHEST_PROTOCOL)
finally:
del Popen._tls.process_handle
to_child.close()
@staticmethod
def thread_is_spawning():
return getattr(Popen._tls, 'process_handle', None) is not None
Here, thread_is_spawning
returns True
if the Popen._tls
object has a process_handle
attribute. We can see that the process_handle
attribute gets created in __init__
, then the data we want inherited is passed from the parent to child using dump
, then the attribute is deleted. So thread_is_spawning
will only be True
during __init__
. According to this python-ideas mailing list thread, this is actually an artificial limitation added to simulate the same behavior as os.fork
on Linux. Windows actually could support passing the Lock
at any time, because DuplicateHandle
can be run at any time.
All of the above applies to the Queue
object because it uses Lock
internally.
I would say that inheriting Lock
objects is preferable to using a Manager.Lock()
, because when you use a Manager.Lock
, every single call you make to the Lock
must be sent via IPC to the Manager
process, which is going to be much slower than using a shared Lock
that lives inside the calling process. Both approaches are perfectly valid, though.
Finally, it is possible to pass a Lock
to all members of a Pool
without using a Manager
, using the initializer
/initargs
keyword arguments:
lock = None
def initialize_lock(l):
global lock
lock = l
def scenario_1_pool_no_manager(jobfunc, args, ncores):
"""Runs a pool of processes WITHOUT a Manager for the lock and queue.
"""
lock = mp.Lock()
mypool = mp.Pool(ncores, initializer=initialize_lock, initargs=(lock,))
queue = mp.Queue()
iterator = make_iterator(args, queue)
mypool.imap(jobfunc, iterator) # Don't pass lock. It has to be used as a global in the child. (This means `jobfunc` would need to be re-written slightly.
mypool.close()
mypool.join()
return read_queue(queue)
This works because arguments passed to initargs
get passed to the __init__
method of the Process
objects that run inside the Pool
, so they end up being inherited, rather than pickled.