Does C and C++ guarantee the ASCII of [a-f] and [A-F] characters?

No, it does not.

The C standard guarantees that the decimal digits and uppercase and lowercase letters exist, along with a number of other characters. It also guarantees that the decimal digits are contiguous, for example '0' + 9 == '9', and that all members of the basic execution character set have non-negative values. It specifically does not guarantee that the letters are contiguous. (For all the gory details, see the N1570 draft of the C standard, section 5.2.1; the guarantee that basic characters are non-negative is in 6.2.5p3, in the discussion of type char.)

The assumption that 'a' .. 'f' and 'A' .. 'F' have contiguous codes is almost certainly a reasonable one. In ASCII and all ASCII-based character sets, the 26 lowercase letters are contiguous, as are the 26 uppercase letters. Even in EBCDIC, the only significant rival to ASCII, the alphabet as a whole is not contiguous, but the letters 'a' ..'f' and 'A' .. 'F' are (EBCDIC has gaps between 'i' and 'j', between 'r' and 's', between 'I' and 'J', and between 'R' and 'S').

However, the assumption that setting bit 5 of the representation will convert uppercase letters to lowercase is not valid for EBCDIC. In ASCII, the codes for the lowercase and uppercase letters differ by 32; in EBCDIC they differ by 64.

This kind of bit-twiddling to save an instruction or two might be reasonable in code that’s part of the standard library or that’s known to be performance-critical. The implicit assumption of an ASCII-based character set should IMHO at least be made explicit by a comment. A 256-element static lookup table would probably be even faster at the expense of a tiny amount of extra storage.

More Related Contents:

Leave a Comment Cancel reply