Is accessing the “value” of a linker script variable undefined behavior in C?

Shorter answer:

Accessing the “value” of a linker script variable is NOT undefined behavior, and is fine to do, so long as you want the actual data stored at that location in memory and not the address of that memory or the “value” of a linkerscript variable which happens to be seen by C code as an address in memory only and not a value.

Yeah, that’s kind of confusing, so re-read that 3 times carefully. Essentially, if you want to access the value of a linker script variable just ensure your linker script is set up to prevent anything you don’t want from ending up in that memory address so that whatever you DO want there is in fact there. This way, reading the value at that memory address will provide you something useful you expect to be there.

BUT, if you’re using linker script variables to store some sort of “values” in and of themselves, the way to grab the “values” of these linker script variables in C is to read their addresses, because the “value” you assign to a variable in a linker script IS SEEN BY THE C COMPILER AS THE “ADDRESS” of that linker script variable, since linker scripts are designed to manipulate memory and memory addresses, NOT traditional C variables.

Here’s some really valuable and correct comments under my question which I think are worth posting in this answer so they never get lost. Please go upvote his comments under my question above.

The C standard does not define anything at all about linker script symbols. Any specification of behavior is up to the GNU tools. That said, if a linker script symbol identifies a place in memory where some valid object is stored, I would expect accessing the value of that object to work, if it were accessed with its proper type. Supposing __flash_start__ is normally accessible memory, and except for any requirements of your system about what is at __flash_start__, you could, in theory, put a uint32_t (using appropriate input to the linker) and then access it via __flash_start__.
– Eric Postpischil

That documentation is not written very well, and you are taking the first sentence too literally. What is really happening here is that the linker’s notion of the “value” of a symbol and a programming language’s notion of the “value” of an identifier are different things. To the linker, the value of a symbol is simply a number associated with it. In a programming language, the value is a number (or other element in the set of values of some type) stored in the (sometimes notional) storage associated with the identifier. The documentation is advising you that the linker’s value of a symbol appears inside a language like C as the address associated with the identifier, rather than the contents of its storage…

THIS PART IS REALLY IMPORTANT and we should get the GNU linker script manual updated:

It goes too far when it tells you to “never attempt to use its value.”

It is correct that merely defining a linker symbol does not reserve the necessary storage for a programming language object, and therefore merely having a linker symbol does not provide you storage you can access. However if you ensure storage is allocated by some other means, then, sure, it can work as a programming language object. There is no general prohibition on using a linker symbol as an identifier in C, including accessing its C value, if you have properly allocated storage and otherwise satisfied the requirements for this. If the linker value of __flash_start__ is a valid memory address, and you have ensure there is storage for a uint32_t at that address, and it is a properly aligned address for a uint32_t, then it is okay to access __flash_start__ in C as if it were a uint32_t. That would not be defined by the C standard, but by the GNU tools.
– Eric Postpischil

Long answer:

I said in the question:

// 1. correct way A:
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)&__flash_start__);

// OR 2. correct way B (my preferred approach):
extern uint32_t __flash_start__[]; // not a true array; [] is required to access linker script variables (addresses) as though they were normal variables
printf("__flash_start__ addr = 0x%lX\n", (uint32_t)__flash_start__);

// OR 3. COMPLETELY WRONG WAY TO DO IT!
// - IS THIS UNDEFINED BEHAVIOR?
extern uint32_t __flash_start__;
printf("__flash_start__ addr = 0x%lX\n", __flash_start__);

(See discussion under the question for how I came to this).

Looking specifically at #3 above:

Well, actually, if your goal is to read the address of __flash_start__, which is 0x8000000 in this case, then yes, this is completely wrong. But, it is NOT undefined behavior! What it is actually doing, instead, is reading the contents (value) of that address (0x8000000) as a uint32_t type. In other words, it’s simply reading the first 4 bytes of the FLASH section, and interpreting them as a uint32_t. The contents (uint32_t value at this address) just so happen to be 0x20080000 in this case.

To further prove this point, the following are exactly identical:

// Read the actual *contents* of the `__flash_start__` address as a 4-byte value!

// forward declaration to make a variable defined in the linker script
// accessible in the C code
extern uint32_t __flash_start__; 

// These 2 read techniques do the exact same thing.
uint32_t u32_1 = __flash_start__;                 // technique 1
uint32_t u32_2 = *((uint32_t *)&__flash_start__); // technique 2
printf("u32_1 = 0x%lX\n", u32_1);
printf("u32_2 = 0x%lX\n", u32_2);

The output is:

u32_1 = 0x20080000
u32_2 = 0x20080000

Notice they produce the same result. They each are producing a valid uint32_t-type value which is stored at address 0x8000000.

It just so turns out, however, that the u32_1 technique shown above is a more straight-forward and direct way of reading the value is all, and again, is not undefined behavior. Rather, it is correctly reading the value (contents of) that address.

I seem to be talking in circles. Anyway, mind blown, but I get it now. I was convinced before I was supposed to use the u32_2 technique shown above only, but it turns out they are both just fine, and again, the u32_1 technique is clearly more straight-forward (there I go talking in circles again). 🙂

Cheers.


Digging deeper: Where did the 0x20080000 value stored right at the start of my FLASH memory come from?

One more little tidbit. I actually ran this test code on an STM32F777 mcu, which has 512KiB of RAM. Since RAM starts at address 0x20000000, this means that 0x20000000 + 512K = 0x20080000. This just so happens to also be the contents of the RAM at address zero because Programming Manual PM0253 Rev 4, pg. 42, “Figure 10. Vector table” shows that the first 4 bytes of the Vector Table contain the “Initial SP [Stack Pointer] value”. See here:

enter image description here

I know that the Vector Table sits right at the start of the program memory, which is located in Flash, so that means that 0x20080000 is my initial stack pointer value. This makes sense, because the Reset_Handler is the start of the program (and its vector just so happens to be the 2nd 4-byte value at the start of the Vector Table, by the way), and the first thing it does, as shown in my “startup_stm32f777xx.s” startup assembly file, is set the stack pointer (sp) to _estack:

Reset_Handler:  
  ldr   sp, =_estack      /* set stack pointer */

Furthermore, _estack is defined in my linker script as follows:

/* Highest address of the user mode stack */
_estack = ORIGIN(RAM) + LENGTH(RAM);    /* end of RAM */

So there you have it! The first 4-byte value in my Vector Table, right at the start of Flash, is set to be the initial stack pointer value, which is defined as _estack right in my linker script file, and _estack is the address at the end of my RAM, which is 0x20000000 + 512K = 0x20080000. So, it all makes sense! I’ve just proven I read the right value!

See also:

  1. [my answer] How to get value of variable defined in ld linker script from C

Leave a Comment