How does strcmp() work?

The pseudo-code “implementation” of strcmp would go something like:

define strcmp (s1, s2):
    p1 = address of first character of str1
    p2 = address of first character of str2

    while contents of p1 not equal to null:
        if contents of p2 equal to null: 
            return 1

        if contents of p2 greater than contents of p1:
            return -1

        if contents of p1 greater than contents of p2:
            return 1

        advance p1
        advance p2

    if contents of p2 not equal to null:
        return -1

    return 0

That’s basically it. Each character is compared in turn an a decision is made as to whether the first or second string is greater, based on that character.

Only if the characters are identical do you move to the next character and, if all the characters were identical, zero is returned.

Note that you may not necessarily get 1 and -1, the specs say that any positive or negative value will suffice, so you should always check the return value with < 0, > 0 or == 0.

Turning that into real C would be relatively simple:

int myStrCmp (const char *s1, const char *s2) {
    const unsigned char *p1 = (const unsigned char *)s1;
    const unsigned char *p2 = (const unsigned char *)s2;

    while (*p1 != '\0') {
        if (*p2 == '\0') return  1;
        if (*p2 > *p1)   return -1;
        if (*p1 > *p2)   return  1;

        p1++;
        p2++;
    }

    if (*p2 != '\0') return -1;

    return 0;
}

Also keep in mind that “greater” in the context of characters is not necessarily based on simple ASCII ordering for all string functions.

C has a concept called ‘locales’ which specify (among other things) collation, or ordering of the underlying character set and you may find, for example, that the characters a, á, à and ä are all considered identical. This will happen for functions like strcoll.

Leave a Comment