Why is a segmentation fault not recoverable?

When exactly does segmentation fault happen (=when is SIGSEGV sent)?

When you attempt to access memory you don’t have access to, such as accessing an array out of bounds or dereferencing an invalid pointer. The signal SIGSEGV is standardized but different OS might implement it differently. “Segmentation fault” is mainly a term used in *nix systems, Windows calls it “access violation”.

Why is the process in undefined behavior state after that point?

Because one or several of the variables in the program didn’t behave as expected. Let’s say you have some array that is supposed to store a number of values, but you didn’t allocate enough room for all them. So only those you allocated room for get written correctly, and the rest written out of bounds of the array can hold any values. How exactly is the OS to know how critical those out of bounds values are for your application to function? It knows nothing of their purpose.

Furthermore, writing outside allowed memory can often corrupt other unrelated variables, which is obviously dangerous and can cause any random behavior. Such bugs are often hard to track down. Stack overflows for example are such segmentation faults prone to overwrite adjacent variables, unless the error was caught by protection mechanisms.

If we look at the behavior of “bare metal” microcontroller systems without any OS and no virtual memory features, just raw physical memory – they will just silently do exactly as told – for example, overwriting unrelated variables and keep on going. Which in turn could cause disastrous behavior in case the application is mission-critical.

Why is it not recoverable?

Because the OS doesn’t know what your program is supposed to be doing.

Though in the “bare metal” scenario above, the system might be smart enough to place itself in a safe mode and keep going. Critical applications such as automotive and med-tech aren’t allowed to just stop or reset, as that in itself might be dangerous. They will rather try to “limp home” with limited functionality.

Why does this solution avoid that unrecoverable state? Does it even?

That solution is just ignoring the error and keeps on going. It doesn’t fix the problem that caused it. It’s a very dirty patch and setjmp/longjmp in general are very dangerous functions that should be avoided for any purpose.

We have to realize that a segmentation fault is a symptom of a bug, not the cause.

Leave a Comment