C Program crashes when adding an extra int

Question

You said your first program “works”, but it works only by accident. It’s like a car zooming down the road with no lugnuts holding on the front wheels, only by some miracle they haven’t fallen off — yet.

You said

char *lineName;

This gives you a pointer variable that can point to some characters, but it doesn’t point anywhere yet. The value of this pointer is undefined. It’s sort of like saying “int i” and asking what the value of i is.

Next you said

scanf("%s (%d)", lineName, &stationNo)

You’re asking scanf to read a line name and store the string in the memory pointed to by lineName. But where is that memory? We have no idea whatsoever!

The situation with uninitialized pointers is a little trickier to think about because, as always, with pointers we have to distinguish between the value of the pointer as opposed to the data at the memory which the pointer points to. Earlier I mentioned saying int i and asking what the value of i is. Now, there’s going to be some bit pattern in i — it might be 0, or 1, or -23, or 8675309.

Similarly, there’s going to be some bit pattern in lineName — it might “point at” memory location 0x00000000, or 0xffe01234, or 0xdeadbeef. But then the questions are: is there actually any memory at that location, and do we have permission to write to it, and is it being used for anything else? If there is memory and we do have permission and it’s not being used for anything else, the program might seem to work — for now. But those are three pretty big ifs! If the memory doesn’t exist, or if we don’t have permission to write to it, the program is probably going to crash when it tries. And if the memory is being used for something else, something’s going to go wrong — if not now, then later — when we ask scanf to write its string there.

And, really, if what we care about is writing programs that work (and that work for the right reasons), we don’t have to ask any of these questions. We don’t have to ask where lineName points when we don’t initialize it, or whether there’s any memory there, or if we have permission to write to it, or if it’s being used for something else. Instead, we should simply, actually, initialize lineName! We should explicitly make it point to memory that we do own and that we are allowed to write to and that isn’t being used for anything else!

There are several ways to do this. The easiest is to use an array for lineName, not a pointer:

char lineName[20];

Or, if we have our hearts set on using a pointer, we can call malloc:

char *lineName = malloc(20);

However, if we do that, we have to check to make sure malloc succeeded:

if(lineName == NULL) {
    fprintf(stderr, "out of memory!\n");
    exit(1);
}

If you make either of those changes, your program will work.

…Well, actually, we’re still in a situation where your program will seem to work, even though it still has another, pretty serious, lurking problem. We’ve allocated 20 characters for lineName, which gives us 19 actual characters, plus the trailing '\0'. But we don’t know what the user is going to type. What if the user types 20 or more characters? That will cause scanf to write more than 20 characters to lineName, off past the end of what lineName‘s memory is allowed to hold, and we’re back in the situation of writing to memory that we don’t own and that might be in use for something else.

One solution is to make lineName bigger — declare it as char lineName[100], or call malloc(100). But that just moves the problem around — now we have to worry about the (perhaps smaller) chance that the user will type 100 or more characters. So the next thing to do is to tell scanf not to write more to lineName than we’ve arranged for it to hold. This is actually pretty simple. If lineName is still set up to hold 20 characters, just call

scanf("%19s (%d)", lineName, &stationNo)

That format specifier %19s tells scanf that it’s only allowed to read and store a string of up to 19 characters long, leaving one byte free for the terminating '\0' that it’s also going to add.

Now, I’ve said a lot here, but I realize I haven’t actually gotten around to answering the question of why your program went from working to crashing when you made that seemingly trivial, seemingly unrelated change. This ends up being a hard question to answer satisfactorily. Going back to the analogy I started this answer with, it’s like asking why you were able to drive the car with no lugnuts to the store with no problem, but when you tried to drive to grandma’s house, the wheels fell off and you crashed into a ditch. There are a million possible factors that might have come into play, but none of them change the underlying fact that driving a car with the wheels not fastened on is a crazy idea, that’s not guaranteed to work at all.

In your case, the variables you’re talking about — lineName, stationNo, i, and then b — are all local variables, typically allocated on the stack. Now, one of the characteristics of the stack is that it gets used for all sorts of stuff, and it never gets cleared between uses. So if you have an uninitialized local variable, the particular random bits that it ends up containing depend on whatever was using that piece of the stack last time. If you change your program slightly so that different functions get called, those different functions may leave different random values lying around on the stack. Or if you change your function to allocate different local variables, the compiler may place them in different spots on the stack, meaning that they’ll end up picking up different random values from whatever was there last time.

Anyway, somehow, with the first version of your program, lineName ended up containing a random value that corresponded to a pointer that pointed to actual memory which you could get away with writing to. But when you added that fourth variable b, things moved around just enough that lineName ended up pointing to memory that didn’t exist or that you didn’t have permission to write to, and your program crashed.

Make sense?

And now, one more thing, if you’re still with me. If you stop and think, this whole thing might be kind of unsettling. You had a program (your first program) that seemed to work just fine, but actually had a decently horrible bug. It wrote to random, unallocated memory. But when you compiled it you got no fatal error messages, and when you ran it there was no indication that anything was amiss. What’s up with that?

The answer, as a couple of the comments alluded to, involves what we call undefined behavior.

It turns out that there are three kinds of C programs, which we might call the good, the bad, and the ugly.

Good programs work for the right reasons. They don’t break any rules, they don’t do anything illegal. They don’t get any warnings or error messages when you compile them, and when you run them, they just work.
Bad programs break some rule, and the compiler catches this, and issues a fatal error message, and declines to produce a broken program for you to try to run.
But then there are the ugly programs, that engage in undefined behavior. These are the ones that break a different set of rules, the ones that, for various reasons, the compiler is not obliged to complain about. (Indeed the compiler may or may not even be able to detect them). And programs that engage in undefined behavior can do anything.

Let’s think about that last point a little more. The compiler is not obligated to generate error messages when you write a program that uses undefined behavior, so you might not realize you’ve done it. And the program is allowed to do anything, including work as you expect. But then, since it’s allowed to do anything, it might stop working tomorrow, seemingly for no reason at all, either because you made some seemingly innocuous change to it, or merely because you’re not around to defend it, as it quietly runs amok and deletes all your customer’s data.

So what are you supposed to do about this?

One thing is to use a modern compiler if you can, and turn on its warnings, and pay attention to them. (Good compilers even have an option called “treat warnings as errors”, and programmers who care about correct programs usually turn this option on.) Even though, as I said, they’re not required to, compilers are getting better and better at detecting undefined behavior and warning you about it, if you ask them to.

And then the other thing, if you’re going to be doing a lot of C programming, is to take care to learn the language, what you’re allowed to do, what you’re not supposed to do. Make a point of writing programs that work for the right reasons. Don’t settle for a program that merely seems to work today. And if someone points out that you’re depending on undefined behavior, don’t say, “But my program works — why should I care?” (You didn’t say this, but some people do.)

More Related Contents:

Leave a Comment Cancel reply