Efficient unsigned-to-signed cast avoiding implementation-defined behavior

Expanding on user71404’s answer:

int f(unsigned x)
{
    if (x <= INT_MAX)
        return static_cast<int>(x);

    if (x >= INT_MIN)
        return static_cast<int>(x - INT_MIN) + INT_MIN;

    throw x; // Or whatever else you like
}

If x >= INT_MIN (keep the promotion rules in mind, INT_MIN gets converted to unsigned), then x - INT_MIN <= INT_MAX, so this won’t have any overflow.

If that is not obvious, take a look at the claim “If x >= -4u, then x + 4 <= 3.”, and keep in mind that INT_MAX will be equal to at least the mathematical value of -INT_MIN – 1.

On the most common systems, where !(x <= INT_MAX) implies x >= INT_MIN, the optimizer should be able (and on my system, is able) to remove the second check, determine that the two return statements can be compiled to the same code, and remove the first check too. Generated assembly listing:

__Z1fj:
LFB6:
    .cfi_startproc
    movl    4(%esp), %eax
    ret
    .cfi_endproc

The hypothetical implementation in your question:

INT_MAX equals 32767
INT_MIN equals -2³² + 32768

is not possible, so does not need special consideration. INT_MIN will be equal to either -INT_MAX, or to -INT_MAX - 1. This follows from C’s representation of integer types (6.2.6.2), which requires n bits to be value bits, one bit to be a sign bit, and only allows one single trap representation (not including representations that are invalid because of padding bits), namely the one that would otherwise represent negative zero / -INT_MAX - 1. C++ doesn’t allow any integer representations beyond what C allows.

Update: Microsoft’s compiler apparently does not notice that x > 10 and x >= 11 test the same thing. It only generates the desired code if x >= INT_MIN is replaced with x > INT_MIN - 1u, which it can detect as the negation of x <= INT_MAX (on this platform).

[Update from questioner (Nemo), elaborating on our discussion below]

I now believe this answer works in all cases, but for complicated reasons. I am likely to award the bounty to this solution, but I want to capture all the gory details in case anybody cares.

Let’s start with C++11, section 18.3.3:

Table 31 describes the header <climits>.

…

The contents are the same as the Standard C library header <limits.h>.

Here, “Standard C” means C99, whose specification severely constrains the representation of signed integers. They are just like unsigned integers, but with one bit dedicated to “sign” and zero or more bits dedicated to “padding”. The padding bits do not contribute to the value of the integer, and the sign bit contributes only as twos-complement, ones-complement, or sign-magnitude.

Since C++11 inherits the <climits> macros from C99, INT_MIN is either -INT_MAX or -INT_MAX-1, and hvd’s code is guaranteed to work. (Note that, due to the padding, INT_MAX could be much less than UINT_MAX/2… But thanks to the way signed->unsigned casts work, this answer handles that fine.)

C++03/C++98 is trickier. It uses the same wording to inherit <climits> from “Standard C”, but now “Standard C” means C89/C90.

All of these — C++98, C++03, C89/C90 — have the wording I give in my question, but also include this (C++03 section 3.9.1 paragraph 7):

The representations of integral types shall define values by use of a
pure binary numeration system.(44) [Example: this International
Standard permits 2’s complement, 1’s complement and signed magnitude
representations for integral types.]

Footnote (44) defines “pure binary numeration system”:

A positional representation for integers that uses the binary digits 0
and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral
power of 2, except perhaps for the bit with the highest position.

What is interesting about this wording is that it contradicts itself, because the definition of “pure binary numeration system” does not permit a sign/magnitude representation! It does allow the high bit to have, say, the value -2^n-1 (twos complement) or -(2^n-1-1) (ones complement). But there is no value for the high bit that results in sign/magnitude.

Anyway, my “hypothetical implementation” does not qualify as “pure binary” under this definition, so it is ruled out.

However, the fact that the high bit is special means we can imagine it contributing any value at all: A small positive value, huge positive value, small negative value, or huge negative value. (If the sign bit can contribute -(2^n-1-1), why not -(2^n-1-2)? etc.)

So, let’s imagine a signed integer representation that assigns a wacky value to the “sign” bit.

A small positive value for the sign bit would result in a positive range for int (possibly as large as unsigned), and hvd’s code handles that just fine.

A huge positive value for the sign bit would result in int having a maximum larger than unsigned, which is is forbidden.

A huge negative value for the sign bit would result in int representing a non-contiguous range of values, and other wording in the spec rules that out.

Finally, how about a sign bit that contributes a small negative quantity? Could we have a 1 in the “sign bit” contribute, say, -37 to the value of the int? So then INT_MAX would be (say) 2³¹-1 and INT_MIN would be -37?

This would result in some numbers having two representations… But ones-complement gives two representations to zero, and that is allowed according to the “Example”. Nowhere does the spec say that zero is the only integer that might have two representations. So I think this new hypothetical is allowed by the spec.

Indeed, any negative value from -1 down to -INT_MAX-1 appears to be permissible as a value for the “sign bit”, but nothing smaller (lest the range be non-contiguous). In other words, INT_MIN might be anything from -INT_MAX-1 to -1.

Now, guess what? For the second cast in hvd’s code to avoid implementation-defined behavior, we just need x - (unsigned)INT_MIN less than or equal to INT_MAX. We just showed INT_MIN is at least -INT_MAX-1. Obviously, x is at most UINT_MAX. Casting a negative number to unsigned is the same as adding UINT_MAX+1. Put it all together:

x - (unsigned)INT_MIN <= INT_MAX

if and only if

UINT_MAX - (INT_MIN + UINT_MAX + 1) <= INT_MAX
-INT_MIN-1 <= INT_MAX
-INT_MIN <= INT_MAX+1
INT_MIN >= -INT_MAX-1

That last is what we just showed, so even in this perverse case, the code actually works.

That exhausts all of the possibilities, thus ending this extremely academic exercise.

Bottom line: There is some seriously under-specified behavior for signed integers in C89/C90 that got inherited by C++98/C++03. It is fixed in C99, and C++11 indirectly inherits the fix by incorporating <limits.h> from C99. But even C++11 retains the self-contradictory “pure binary representation” wording…

More Related Contents:

Leave a Comment Cancel reply