Reservoir sampling
I actually did not realize there was a name for this, so I proved and implemented this from scratch: import random def random_subset( iterator, K ): result = [] N = 0 for item in iterator: N += 1 if len( result ) < K: result.append( item ) else: s = int(random.random() * N) if … Read more