How do I recover a semaphore when the process that decremented it to zero crashes?

Turns out there isn’t a way to reliably recover the semaphore. Sure, anyone can post_sem() to the named semaphore to get the count to increase past zero again, but how to tell when such a recovery is needed? The API provided is too limited and doesn’t indicate in any way when this has happened.

Beware of the ipc tools also available — the common tools ipcmk, ipcrm, and ipcs are only for the outdated SysV semaphores. They specifically do not work with the new POSIX semaphores.

But it looks like there are other things that can be used to lock things, which the operating system does automatically release when an application dies in a way that cannot be caught in a signal handler. Two examples: a listening socket bound to a particular port, or a lock on a specific file.

I decided the lock on a file is the solution I needed. So instead of a sem_wait() and sem_post() call, I’m using:

lockf( fd, F_LOCK, 0 )

and

lockf( fd, F_ULOCK, 0 )

When the application exits in any way, the file is automatically closed which also releases the file lock. Other client apps waiting for the “semaphore” are then free to proceed as expected.

Thanks for the help, guys.


UPDATE:

12 years later, thought I should point out that posix mutexes do have a “robust” attribute. That way, if the owner of the mutex gets killed or exits, the next user to lock the mutex will get the non-error return value of EOWNERDEAD, allowing the mutex to be recovered. This will make it similar to the file and socket locking solution. Look up pthread_mutexattr_setrobust() and pthread_mutex_consistent() for details. Thanks, Reinier Torenbeek, for this hint.

Leave a Comment