Simple Random Samples from a MySQL Sql database

I think the fastest solution is

select * from table where rand() <= .3

Here is why I think this should do the job.

  • It will create a random number for each row. The number is between 0 and 1
  • It evaluates whether to display that row if the number generated is between 0 and .3 (30%).

This assumes that rand() is generating numbers in a uniform distribution. It is the quickest way to do this.

I saw that someone had recommended that solution and they got shot down without proof.. here is what I would say to that –

  • This is O(n) but no sorting is required so it is faster than the O(n lg n)
  • mysql is very capable of generating random numbers for each row. Try this –

    select rand() from INFORMATION_SCHEMA.TABLES limit 10;

Since the database in question is mySQL, this is the right solution.

Leave a Comment