String Literal address across translation units [duplicate]

You can not rely on identical string literals having the same memory location, it is an implementation decision. The C99 draft standard tells us that it is unspecified whether the same string literal are distinct, from section 6.4.5 String literals:

It is unspecified whether these arrays are distinct provided their
elements have the appropriate values. If the program attempts to
modify such an array, the behavior is undefined.

For C++ this covered in the draft standard section 2.14.5 String literals which says:

Whether all string literals are distinct (that is, are stored in
nonoverlapping objects) is implementation defined. The effect of
attempting to modify a string literal is undefined.

The compiler is allowed to pool string literals but you would have to understand how it works from compiler to compiler and so this would not be portable and could potentially change. Visual Studio includes an option for string literal pooling

In some cases, identical string literals may be pooled to save space
in the executable file. In string-literal pooling, the compiler causes
all references to a particular string literal to point to the same
location in memory, instead of having each reference point to a
separate instance of the string literal. To enable string pooling, use
the /GF compiler option.

Note that it does qualify with In some cases.

gcc does support pooling and across compilation units and you can turn it on via -fmerge-constants:

Attempt to merge identical constants (string constants and
floating-point constants) across compilation units.

This option is the default for optimized compilation if the assembler
and linker support it. Use -fno-merge-constants to inhibit this
behavior.

note, the use of attempt and if … support it.

As for a rationale at least for C for not requiring string literals to be pooled we can see from this archived comp.std.c discussion on string literals that the rationale was due to the wide variety of implementation at the time:

GCC might have served as an example but not as motivation. Partly the
desire to have string literals in ROMmable data was to support, er,
ROMming. I vaguely recall having used a couple of C implementations
(before the X3J11 decision was made) where string literals were either
automatically pooled or stored in a constant data program section.
Given the existing variety of practice and the availability of an easy
work-around when the original UNIX properties were wanted, it seemed
best to not try to guarantee uniqueness and writability of string
literals.

Leave a Comment