1

https://math.stackexchange.com/a/116344/25814 provides a mechanism to calculate the running variance of a stream of values. I'd like to replace one of the values included in the running variance with a different value.

For example, millions of users are asked a question. I calculate the variance of their answers. At some point, a user goes back and changes their answer. I'd like to calculate the new variance as it used the new value in the original calculation.

Recalculating the variance from the beginning is a very expensive process. Is there a way to update the running variance without restarting the calculation?

Gili
  • 151
  • 5

2 Answers2

1

Answering my own question. If you take the formulas in Calculate variance from a stream of sample values and isolate for $m_{k-1}$ and $v_{k-1}$ you will end up with the following:

$$ \begin{align*} m_{k-1} & = \frac{(m_k * k - x_k)}{(k - 1)} \\ v_{k-1} & = v_k - (x_k - m_{k-1})(x_k - m_k) \end{align*} $$

Where $x_k$ is the value being removed.

Gili
  • 151
  • 5
0

Take a look at the formula for variance and just subtract the value you want to remove every place it occurs. Remember to subtract 1 from the count.

Then just do your usual running variance with the new value.

marty cohen
  • 107,799
  • Sorry, I don't understand what you mean. Can you please provide a specific formula for the removal step? – Gili Jun 15 '20 at 04:02