The intersection of all combinations of n sets

Here is a solution inspired by MapReduce: Simplified Data Processing on Large Clusters, which can therefore be written in a distributed way if you want.

Map all of your elements in sets to pairs [element, set]. Group this list by element. You can just sort and pull out elements. Or you can create a hash of arrays whose keys are the elements and values are the list of sets that element is found in. Then for each element, emit a list of [combination of sets, element]. Group that by combination. And now for each non-empty combination, you can just read off the elements in it.

If you wish to distribute the computation with a real map-reduce, the first map would map to a key of element, and value of set. The first reduce would just store by element the list of sets it is in. The second map would emit for each element one key for each combination of sets it is in, with the element as the value. And the second reduce would store your final answer.

It may help to work your example in detail. You start with:

Set 1 = 1, 10, 6, 11, 14, 3
Set 2 = 3, 7, 11, 9, 5
Set 3 = 11, 6, 9, 1, 4

You map that to the list:

[1, Set 1], [10, Set 1], [6, Set 1], [11, Set 1], [14, Set 1], [3, Set 1],
[3, Set 2], [7, Set 2], [11, Set 2], [9, Set 2], [5, Set 2],
[11, Set 3], [6, Set 3], [9, Set 3], [1, Set 3], [4, Set 3],

Now group by element (I did it by hand by sorting) to get:

1: Set 1, Set 3
3: Set 1, Set 2
4: Set 3
5: Set 2
6: Set 1, Set 3
7: Set 2
9: Set 2, Set 3
10: Set 1
11: Set 1, Set 2, Set 3
14: Set 1

And now we do the second mapping (skipping the elements that are only in one set) to get:

[(Set 1, Set 3), 1],
[(Set 1, Set 2), 3],
[(Set 1, Set 3), 6],
[(Set 2, Set 3), 9],
[(Set 1, Set 2), 11],
[(Set 1, Set 3), 11],
[(Set 2, Set 3), 11],
[(Set 1, Set 2, Set 3), 11]

Group that by combination of sets and we get:

(Set 1, Set 2): 3, 11
(Set 1, Set 3): 1, 6, 11
(Set 2, Set 3): 9, 11
(Set 1, Set 2, Set 3): 11

Leave a Comment