Creating combinations that have no more one intersecting element

Question

Say you have n letters (or students, or whatever), and every week want to partition them into subsets of size k (for a total of n/k subsets every week). This method will generate almost n/k subsets every week – I show below how to extend it to generate exactly n/k subsets.

Generating the Subsets (no partitioning)

First pick p, the largest prime <= n/k.

Let’s consider every ordered pair (a,b) such that

0 <= a < k
0 <= b < p

We can map each pairing to one of our letters; thus, we can map p*k <= n letters this way (again, I show below how to map exactly n letters)

(0,0) => 'A'
(0,1) => 'B'
...
(0,p-1) => 'F'
(1,0)   => 'G'
(1,1)   => 'H'
...
(k-1,p-1) => 's'

Now, given

0 <= w < p  
0 <= i < p

We can create a set S_w(i) of our ordered pairs. Each pairing in S_w(i) will represent one letter (according to our mapping above), and the set S_w(i) itself represents one “grouping of letters” aka. one subset of size k.

The formula for S_w(i) is

S_w(i) = {(0,i mod p), (1,(w+i) mod p), (2,(2w+i) mod p),..., ((k-1),((k-1)*w+i) mod p)}
      = { (x,y) | 0 <= x < k and y = w*x + i (mod p)}

If we vary w and i over all possible values, we get p² total sets. When we take any two of these sets, they will have at most one intersecting element.

How it works

Say we have two sets S_w1(i₁) and S_w2(i₂). If S_w1(i₁) and S_w2(i₂) have more than one element in common, then there exists at least two x such that

w₁*x + i₁ = w₂*x + i₂ (mod p)  
(w₁-w₂)*x + (i₁-i₂) = 0 (mod p)

However, anyone who’s taken modular arithmetic knows that if p is prime, either x has a unique solution or (w₁ = w₂ and i₁ = i₂); thus, there cannot be more than one x, and S_w1(i₁) and S_w2(i₂) can have at most one intersecting element.

Analysis

Since p < n/k, by Chebyshev’s Theorem (which states there is a prime between x and 2x for x > 3)

n/2k < p <= n/k

Thus, this method generates at least (n/2k)² subsets of letters, though in practice p will be nearer to n/k, so the number will be nearer to (n/k)². Since a simple upper bound for the maximum possible such subsets is n(n-1)/(k(k-1)) (see BlueRaja’s comment below), this means the algorithm is asymptotically optimal, and will generate near the optimal amount of sets (even in the worst case, it won’t generate less than about 1/4th the optimal amount; see again the comment below)

Partitioning

You now want to group the letters into partitions each week: each week, all letters are included in exactly one group.
We do this by letting w be fixed to a certain value (representing the week) and letting i vary from 0 to p-1.

Proof

Consider the groups we created:

S_w(i) = { (x,y) | 0 <= x < k and y = w*x + i (mod p)}

Let’s say w is fixed and i varies from 0 to p-1. Then we get p sets:

S_w(0), S_w(1), ..., S_w(p-1)

Now let’s say S_w(i₁) and S_w(i₂) (with i₁ =/= i₂) intersect; then

w*x + i₁ = w*x + i₂ (mod p)

for some x, and hence i₁ = i₂. Thus, S_w(i₁) and S_w(i₂) don’t intersect.

Since no two of our sets intersect, and there are exactly p of them (each with k elements), our sets form a partition of the k*p letters.

Generating n/k Subsets Each Week

The biggest disadvantage of this method is that it generates sets for p*k letters, rather than n letters. If the last letters can’t be left out (as in your case, where the letters are students), there are two ways to generate exactly n/k subsets each week:

Find a set of prime numbers p₁, p₂, p₃, … which sums up to exactly n/k. Then we can treat each group of p_ik letters as an independent alphabet, so that rather than finding subsets of pk letters, we find one group of subsets for p₁*k letters, another group of subsets for p₂*k letters, another group…
This has the disadvantage that letters from one group will never be paired with letters from another group, reducing the total number of subsets generated. Luckily, if n is even, by Goldbach’s conjecture† you will only need two groups at the most (if n is odd, you will only need three at most)
This method guarantees subsets of size k, but doesn’t generate as many subsets.
† Though unproven, it is known to be true for every ridiculously large number you will likely encounter for this problem
The other option is to use the smallest prime p >= n/k. This will give you p*k >= n letters – after the subsets have been generated, simply throw out the extra letters. Thus, in the end this gives you some subsets with size < k. Assuming k divides n evenly (ie. n/k is an integer), you could take the smaller subsets and mix them up by hand to make subsets of size k, but you risk having some overlap with past/future subsets this way.
This method generates at least as many subsets as the original method, but some may have size < k

Example

Take n = 15, k = 3. i.e. there are 15 students and we are making groups of three.

To begin with, we pick largest prime p <= n/k. n/k is prime (lucky us!), so p = 5.

We map the 15 students into the ordered pairs (a,b) described above, giving us (each letter is a student):

(0,0) = A
(0,1) = B
(0,2) = C
(0,3) = D
(0,4) = E

(1,0) = F
(1,1) = G
(1,2) = H
(1,3) = I
(1,4) = J

(2,0) = K
(2,1) = L
(2,2) = M
(2,3) = N
(2,4) = O

The method generates 25 groups of three. Thus, since we need to schedule n/k = 5 groups each week, we can schedule 5 weeks of activities (5 groups a week * 5 weeks = 25 groups).

For week 0, we generate the partition as

S₀(i), for i = 0 to 4.

S₀(0) = { (0,0), (1,0), (2,0) } = AFK
S₀(1) = { (0,1), (1,1), (2,1) } = BGL
S₀(2) = { (0,2), (1,2), (2,2) } = CHM
S₀(3) = { (0,3), (1,3), (2,3) } = DIN
S₀(4) = { (0,4), (1,4), (2,4) } = EJO

For week 4 it will be

S₄(i) for i = 0 to 4.

S₄(0) = { (0,0), (1, (4*1 + 0) mod 5), (2, (2*4 + 0) mod 5) }
      = { (0,0), (1,4), (2,3) }
      = AJN
S₄(1) = { (0,1), (1, (4*1 + 1) mod 5), (2, (4*2 + 1) mod 5) }
      = { (0,1), (1,0), (2,4) }
      = BFO
S₄(2) = { (0,2), (1, (4*1 + 2) mod 5), (2, (4*2 + 2) mod 5) }
      = { (0,2), (1,1), (2,0) }
      = CGK
S₄(3) = { (0,3), (1, (4*1 + 3) mod 5), (2, (4*2 + 3) mod 5) }
      = { (0,3), (1,2), (2,1) }
      = DHL
S₄(4) = { (0,4), (1, (4*1 + 4) mod 5), (2, (4*2 + 4) mod 5) }
      = { (0,4), (1,3), (2,2) }
      = EIM

Here’s the schedule for all 5 weeks:

Week: 0
S₀(0) ={(0,0) (1,0) (2,0) } = AFK
S₀(1) ={(0,1) (1,1) (2,1) } = BGL
S₀(2) ={(0,2) (1,2) (2,2) } = CHM
S₀(3) ={(0,3) (1,3) (2,3) } = DIN
S₀(4) ={(0,4) (1,4) (2,4) } = EJO

Week: 1
S₁(0) ={(0,0) (1,1) (2,2) } = AGM
S₁(1) ={(0,1) (1,2) (2,3) } = BHN
S₁(2) ={(0,2) (1,3) (2,4) } = CIO
S₁(3) ={(0,3) (1,4) (2,0) } = DJK
S₁(4) ={(0,4) (1,0) (2,1) } = EFL

Week: 2
S₂(0) ={(0,0) (1,2) (2,4) } = AHO
S₂(1) ={(0,1) (1,3) (2,0) } = BIK
S₂(2) ={(0,2) (1,4) (2,1) } = CJL
S₂(3) ={(0,3) (1,0) (2,2) } = DFM
S₂(4) ={(0,4) (1,1) (2,3) } = EGN

Week: 3
S₃(0) ={(0,0) (1,3) (2,1) } = AIL
S₃(1) ={(0,1) (1,4) (2,2) } = BJM
S₃(2) ={(0,2) (1,0) (2,3) } = CFN
S₃(3) ={(0,3) (1,1) (2,4) } = DGO
S₃(4) ={(0,4) (1,2) (2,0) } = EHK

Week: 4
S₄(0) ={(0,0) (1,4) (2,3) } = AJN
S₄(1) ={(0,1) (1,0) (2,4) } = BFO
S₄(2) ={(0,2) (1,1) (2,0) } = CGK
S₄(3) ={(0,3) (1,2) (2,1) } = DHL
S₄(4) ={(0,4) (1,3) (2,2) } = EIM

More Practical Example

In your case, n = 1000 students and k = 4 in each group. Thus, we pick p as the largest prime <= (n/k = 1000/4 = 250), so p = 241. Without considering the alterations above under “Generating n/k Subsets Each Week”, this method will generate a schedule for 961 students lasting 241 weeks.

(An upper-bound for the maximum number of subsets possible would be 1000*999/(4*3) = 83250, though the actual number is likely less than that. Even so, this method generates 58081 subsets, or about 70% of the theoretical maximum!)

If we use the first method above to generate a schedule for exactly 1000 students, we take p₁ = 113, p₂ = 137 (so that p₁ + p₂ = n/k). Thus, we can generate (113)^2 + (137)^2 = 31,538 subsets of students, enough to last 113 weeks.

If we use the second method above to generate a schedule for exactly 1000 students, we take p = 251. This will give us a schedule for 1004 students for 251 weeks; we remove the 4 phantom students from the schedule each week. Usually, this will result in four groups of 3 every week (though unlikely, it is also possible to have for example one group of 2 and two groups of 3). The groups with < 4 students will always have a multiple-of-4 total number of students, so you could manually place those students into groups of 4, at the risk of potentially having two of those students together again later in another group.

Final thoughts

One flaw of this algorithm is that it’s not really flexible: if a student drops out, we are forever stuck with a phantom student. Also, there is no way to add new students to the schedule midway through the year (unless we allow for them by initially creating phantom students).

This problem falls under the category of Restricted Set Systems in combinatorics. See this paper for more information, especially Chapters 1 and 2. Since it is a postscript file, you will need gsview or something to view it.