Computing Standard Deviation in a stream

As outlined in the Wikipedia article on the standard deviation, it is enough to keep track of the following three sums:

s0 = sum(1 for x in samples)
s1 = sum(x for x in samples)
s2 = sum(x*x for x in samples)

These sums are easily updated as new values arrive. The standard deviation can be calculated as

std_dev = math.sqrt((s0 * s2 - s1 * s1)/(s0 * (s0 - 1)))

Note that this way of computing the standard deviation can be numerically ill-conditioned if your samples are floating point numbers and the standard deviation is small compared to the mean of the samples. If you expect samples of this type, you should resort to Welford’s method (see the accepted answer).

Leave a Comment