Random Weighted Choice in T-SQL

Dane’s answer includes a self joins in a way that introduces a square law. (n*n/2) rows after the join where there are n rows in the table.

What would be more ideal is to be able to just parse the table once.

DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = FLOOR(((@weight_sum - 1) * RAND() + 1))

SELECT
    @id = CASE WHEN @weight_point < 0 THEN @id ELSE [table].id END,
    @weight_point = @weight_point - [table].weight
FROM
    @table [table]
ORDER BY
    [table].Weight DESC

This will go through the table, setting @id to each record’s id value while at the same time decrementing @weight point. Eventually, the @weight_point will go negative. This means that the SUM of all preceding weights is greater than the randomly chosen target value. This is the record we want, so from that point onwards we set @id to itself (ignoring any IDs in the table).

This runs through the table just once, but does have to run through the entire table even if the chosen value is the first record. Because the average position is half way through the table (and less if ordered by ascending weight) writing a loop could possibly be faster… (Especially if the weightings are in common groups):

DECLARE @id int, @weight_sum int, @weight_point int, @next_weight int, @row_count int
DECLARE @table TABLE (id int, weight int)

INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)

SELECT @weight_sum = SUM(weight)
FROM @table

SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)

SELECT @next_weight = MAX(weight) FROM @table
SELECT @row_count   = COUNT(*)    FROM @table WHERE weight = @next_weight
SET @weight_point = @weight_point - (@next_weight * @row_count)

WHILE (@weight_point > 0)
BEGIN
    SELECT @next_weight = MAX(weight) FROM @table WHERE weight < @next_weight
    SELECT @row_count   = COUNT(*)    FROM @table WHERE weight = @next_weight
    SET @weight_point = @weight_point - (@next_weight * @row_count)
END

-- # Once the @weight_point is less than 0, we know that the randomly chosen record
-- # is in the group of records WHERE [table].weight = @next_weight

SELECT @row_count = FLOOR(((@row_count - 1) * RAND() + 1))

SELECT
    @id = CASE WHEN @row_count < 0 THEN @id ELSE [table].id END,
    @row_count = @row_count - 1
FROM
    @table [table]
WHERE
    [table].weight = @next_weight
ORDER BY
    [table].Weight DESC

Leave a Comment