How to rank a million images with a crowdsourced sort

As others have said, ranking 1-10 does not work that well because people have different levels.

The problem with the Pick A-or-B method is that its not guaranteed for the system to be transitive (A can beat B, but B beats C, and C beats A). Having nontransitive comparison operators breaks sorting algorithms. With quicksort, against this example, the letters not chosen as the pivot will be incorrectly ranked against each other.

At any given time, you want an absolute ranking of all the pictures (even if some/all of them are tied). You also want your ranking not to change unless someone votes.

I would use the Pick A-or-B (or tie) method, but determine ranking similar to the Elo ratings system which is used for rankings in 2 player games (originally chess):

The Elo player-rating
system compares players’ match records
against their opponents’ match records
and determines the probability of the
player winning the matchup. This
probability factor determines how many
points a players’ rating goes up or
down based on the results of each
match. When a player defeats an
opponent with a higher rating, the
player’s rating goes up more than if
he or she defeated a player with a
lower rating (since players should
defeat opponents who have lower
ratings).

The Elo System:

  1. All new players start out with a base rating of 1600
  2. WinProbability = 1/(10^(( Opponent’s Current Rating–Player’s Current Rating)/400) + 1)
  3. ScoringPt = 1 point if they win the match, 0 if they lose, and 0.5 for a draw.
  4. Player’s New Rating = Player’s Old Rating + (K-Value * (ScoringPt–Player’s Win Probability))

Replace “players” with pictures and you have a simple way of adjusting both pictures’ rating based on a formula. You can then perform a ranking using those numeric scores. (K-Value here is the “Level” of the tournament. It’s 8-16 for small local tournaments and 24-32 for larger invitationals/regionals. You can just use a constant like 20).

With this method, you only need to keep one number for each picture which is a lot less memory intensive than keeping the individual ranks of each picture to each other picture.

EDIT: Added a little more meat based on comments.

Leave a Comment