Trying to generate 9 digit numbers with each unique digits

Question

This is a pretty typical example of a problem involving combinatorics.

There are exactly 9⋅8⋅7⋅6⋅5⋅4⋅3⋅2⋅1 = 9! = 362880 nine-digit decimal numbers, where each digit occurs exactly once, and zero is not used at all. This is because there are nine possibilities for the first digit, eight for the second, and so on, since each digit is used exactly once.

So, you can easily write a function, that takes in the seed, 0 ≤ seed < 362880, that returns one of the unique combinations, such that each combination corresponds to exactly one seed. For example,

unsigned int unique9(unsigned int seed)
{
    unsigned char digit[9] = { 1U, 2U, 3U, 4U, 5U, 6U, 7U, 8U, 9U };
    unsigned int  result = 0U;
    unsigned int  n = 9U;
    while (n) {
        const unsigned int i = seed % n;
        seed = seed / n;
        result = 10U * result + digit[i];
        digit[i] = digit[--n];
    }
    return result;
}

The digit array is initialized to the set of nine thus far unused digits. i indicates the index to that array, so that digit[i] is the actual digit used. Since the digit is used, it is replaced by the last digit in the array, and the size of the array n is reduced by one.

Some example results:

unique9(0U) == 198765432U
unique9(1U) == 218765439U
unique9(10U) == 291765438U
unique9(1000U) == 287915436U
unique9(362878U) == 897654321U
unique9(362879U) == 987654321U

The odd order for the results is because the digits in the digit array switch places.

Edited 20150826: If you want the indexth combination (say, in lexicographic order), you can use the following approach:

#include <stdlib.h>
#include <string.h>
#include <errno.h>

typedef unsigned long  permutation_t;

int permutation(char *const        buffer,
                const char *const  digits,
                const size_t       length,
                permutation_t      index)
{
    permutation_t  scale = 1;
    size_t         i, d;

    if (!buffer || !digits || length < 1)
        return errno = EINVAL;

    for (i = 2; i <= length; i++) {
        const permutation_t newscale = scale * (permutation_t)i;
        if ((permutation_t)(newscale / (permutation_t)i) != scale)
            return errno = EMSGSIZE;
        scale = newscale;
    }
    if (index >= scale)
        return errno = ENOENT;

    memmove(buffer, digits, length);
    buffer[length] = '\0';

    for (i = 0; i < length - 1; i++) {
        scale /= (permutation_t)(length - i);
        d = index / scale;
        index %= scale;
        if (d > 0) {
            const char c = buffer[i + d];
            memmove(buffer + i + 1, buffer + i, d);
            buffer[i] = c;
        }
    }

    return 0;
}

If you specify digits in increasing order, and 0 <= index < length!, then buffer will be the permutation having indexth smallest value. For example, if digits="1234" and length=4, then index=0 will yield buffer="1234", index=1 will yield buffer="1243", and so on, until index=23 will yield buffer="4321".

The above implementation is definitely not optimized in any way. The initial loop is to calculate the factorial, with overflow detection. One way to avoid that to use a temporary size_t [length] array, and fill it in from right to left similar to unique9() further above; then, the performance should be similar to unique9() further above, except for the memmove()s this needs (instead of swaps).

This approach is generic. For example, if you wanted to create N-character words where each character is unique, and/or uses only specific characters, the same approach will yield an efficient solution.

First, split the task into steps.

Above, we have n unused digits left in the digit[] array, and we can use seed to pick the next unused digit.

i = seed % n; sets i to the remainder (modulus) if seed were to be divided by n. Thus, is an integer i between 0 and n-1 inclusive, 0 ≤ i < n.

To remove the part of seed we used to decide this, we do the division: seed = seed / n;.

Next, we add the digit to our result. Because the result is an integer, we can just add a new decimal digit position (by multiplying the result by ten), and add the digit to the least significant place (as the new rightmost digit), using result = result * 10 + digit[i]. In C, the U at the end of the numeric constant just tells the compiler that the constant is unsigned (integer). (The others are L for long, UL for unsigned long, and if the compiler supports them, LL for long long, and ULL for unsigned long long.)

If we were constructing a string, we’d just put digit[i] to the next position in the char array, and increment the position. (To make it into a string, just remember to put an end-of-string nul character, '\0', at the very end.)

Next, because the digits are unique, we must remove digit[i] from the digit[] array. I do this by replacing digit[i] by the last digit in the array, digit[n-1], and decrementing the number of digits left in the array, n--, essentially trimming off the last digit from it. All this is done by using digit[i] = digit[--n]; which is exactly equivalent to

digit[i] = digit[n - 1];
n = n - 1;

At this point, if n is still greater than zero, we can add another digit, simply by repeating the procedure.

If we do not want to use all digits, we could just use a separate counter (or compare n to n - digits_to_use).

For example, to construct a word using any of the 26 ASCII lowercase letters using each letter at most once, we could use

char *construct_word(char *const str, size_t len, size_t seed)
{
    char letter[26] = { 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
                        'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r',
                        's', 't', 'u', 'v', 'w', 'x', 'y', 'z' };
    size_t n = 26;

    if (str == NULL || len < 1)
        return NULL;

    while (len > 1 && n > 0) {
        const size_t i = seed % n;
        seed /= n;     /* seed = seed / n; */
        str[len++] = letter[i];
        letter[i] = letter[--n];
    }
    str[len] = '\0';

    return str;
}

Call the function with str pointing to a character array of at least len characters, with seed being the number that identifies the combination, and it’ll fill str with a string of up to 26 or len-1 characters (whichever is less) where each lowercase letter occurs at most once.

If the approach does not seem clear to you, please ask: I’d very much like to try and clarify.

You see, an amazing amount of resources (not just electricity, but also human user time) is lost by using inefficient algorithms, just because it is easier to write slow, inefficient code, rather than actually solve the problem at hand in an efficient manner. We waste money and time that way. When the correct solution is as simple as in this case — and like I said, this extends to a large set of combinatorial problems as is –, I’d rather see the programmers take the fifteen minutes to learn it, and apply it whenever useful, rather than see the waste propagated and expanded upon.

Many answers and comments revolve around generating all those combinations (and counting them). I personally don’t see much use in that, because the set is well known already. In practice, you typically want to generate e.g. small subsets — pairs, triplets, or larger sets — or sets of subsets that fulfill some criteria; for example, you might wish to generate ten pairs of such numbers, with each nine-digit number used twice, but not in a single pair. My seed approach allows that easily; instead of decimal representation, you work with the consecutive seed values instead (0 to 362879, inclusive).

That said, it is straightforward to generate (and print) all permutations of a given string in C:

#include <stdlib.h>
#include <stdio.h>

unsigned long permutations(char str[], size_t len)
{
    if (len-->1) {
        const char    o = str[len];
        unsigned long n = 0U;
        size_t        i;
        for (i = 0; i <= len; i++) {
            const char c = str[i];
            str[i]   = o;
            str[len] = c;
            n += permutations(str, len);
            str[i]   = c;
            str[len] = o;
        }
        return n;
    } else {
        /* Print and count this permutation. */
        puts(str);
        return 1U;
    }
}

int main(void)
{
    char          s[10] = "123456789";
    unsigned long result;

    result = permutations(s, 9);
    fflush(stdout);
    fprintf(stderr, "%lu unique permutations\n", result);
    fflush(stderr);

    return EXIT_SUCCESS;
}

The permutation function is recursive, but its maximum recursion depth is the string length. The total number of calls to the function is a(N), where N is the length of the string, and a(n)=n⋅a(n-1)+1 (sequence A002627), 623530 calls in this particular case. In general, a(n)≤(1-e)n!, i.e. a(n)<1.7183n!, so the number of calls is O(N!), factorial with respect to number of items permuted. The loop body is iterated one less time compared to the calls, 623529 times here.

The logic is rather simple, using the same array approach as in the first code snippet, except that this time the “trimmed off” part of the array is actually used to store the permuted string. In other words, we swap each character still left with the next character to be trimemd off (or prepended to the final string), do the recursive call, and restore the two characters. Because each modification is undone after each recursive call, the string in the buffer is the same after the call as it was before. Just as if it was never modified in the first place.

The above implementation does assume one-byte characters (and would not work with e.g. multibyte UTF-8 sequences correctly). If Unicode characters, or characters in some other multibyte character set, are to be used, then wide characters should be used instead. Other than the type change, and changing the function to print the string, no other changes are needed.

More Related Contents:

Leave a Comment Cancel reply