130

Is there a way to incrementally calculate (or estimate) the average of a vector (a set of numbers) without knowing their count in advance?

For example you have a = [4 6 3 9 4 12 4 18] and you want to get an estimate of the average but you don't have all the values at hand so you want to have a running average without keeping all the previous values available.

Ali
  • 1,721

5 Answers5

147

You need to keep at least two pieces of information: the number of terms so far and the running mean (or something equivalent to it).

Let's suppose the $n$th component of the vector is $a_n$ and the running mean up to this is $m_n$ so $$m_n= \frac{1}{n}\sum_{i=1}^n a_i.$$

Starting with $m_0=0$, you can use the obvious single pass $$m_n = \frac{(n-1)m_{n-1}+a_n}{n}$$ but precision errors are likely to be smaller if you use the equivalent $$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}.$$

Henry
  • 157,058
  • 1
    Can you elaborate on why precision errors are smaller in the second expression? Does the first expression contain a subtraction of nearly equal values? – Paul Nov 18 '13 at 16:02
  • 5
    If $n$ is large, it is likely that $(n-1)m_{n-1}$ will be much larger than $a_n$, so making the addition less precise – Henry Nov 18 '13 at 19:58
  • 1
    What if n is so big that (using finite precision float point numbers) $$\frac{a_n - m_{n-1}}{n}$$ gives us zero? – castarco Jan 08 '14 at 18:57
  • 1
    @Henry yes, but we still have a problem. We reach that $n$, and one sample wont change the mean.. but what if we have $k$ new samples? I think the solution is to calculate the mean using a "tree" of averages of samples subsets. – castarco Jan 09 '14 at 16:35
  • 3
    @castarco Suppose you have $n-k$ terms with mean $m_{n-k}$ and $k$ terms with mean $m_k$ with $0 \ll k \ll n$. Then your new combined mean would be $m_{n-k}+ \dfrac{k}{n}(m_k -m_{n-k})$. – Henry Jan 09 '14 at 16:49
  • The second and the third formulae are basically the same thing. Multiply and divide as necessary and you'll get the third one. But I see how precision errors are avoided in the third case. – Ashok Goli Sep 28 '16 at 12:47
  • What would be the best way to add weight to this? – Adam Moisa Jun 02 '17 at 06:14
  • 3
    @Adam: something like $ m_n = m_{n-1} + \dfrac{w_n(a_{n}-m_{n-1})}{w_n + \sum_{i=1}^{n-1} w_i}$ and instead of keeping the number of terms so far you need to keep the sum of weights so far – Henry Jun 02 '17 at 06:55
  • @Henry Wn refers to the current sum of weights or the current weight of the item I'm up to? What is Wi? I'm trying to write this in code (js specifically) as I iterate through an array. Any further help would be very much appriciated! Here's what I have so far without weights:

    newMean = prevMean + ((newPrice - prevMean) / currentCount);

    My own weighted average function is this: weightCount += newWeight; weightSum += newPrice * newWeight; newWeightMean = weightSum / weightCount;

    But it seems yours would be more accurate...

    – Adam Moisa Jun 04 '17 at 08:45
  • @Henry I think I'm asking for a single pass version as that's easier for me to translate into code... – Adam Moisa Jun 04 '17 at 08:50
  • @Henry I figured it out. Thank you! – Adam Moisa Jun 04 '17 at 09:05
27

Intuition

It's an easy question and @Henry basically answered it. However, I think it would be nice to add some intuition on the second equation:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

The idea is to represent the new value $x_k$ value by a part that is equal to the previous mean $\mu_{k-1}$ plus the remaining part $x_k-\mu_{k-1}$. The new mean $\mu_k$ we then contains $k$ times the previous mean and one time the remaining part of the new value, divided by the total count. Okay, so how to get there?

Derivation

Let's say you already have the mean $\mu_{k-1}$ for the elements $x_0,...,x_{k-1}$. Of course, we can easily incorporate an additional value $x_k$ by undoing the division, adding the new value, and redoing the scaling (but the new count):

$$\mu_k=\frac{(k-1)*\mu_{k-1}+x_k}{k}$$

Now, let's represent the new value by the previous mean and the difference to the previous mean:

$$x_k=\mu_{k-1}+(x_k-\mu_{k-1})$$

Putting that into our first equation:

$$\mu_k=\frac{(k-1)*\mu_{k-1}+\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

Look how we now got $k$ times the previous mean:

$$\mu_k=\frac{k*\mu_{k-1}+(x_k-\mu_{k-1})}{k}$$

The nice thing is that we don't have to undo and redo the division on the previous mean anymore:

$$\mu_k=\mu_{k-1}+\frac{x_k-\mu_{k-1}}{k}$$

As you can see, we still have to do the division on the difference of the new value to the previous mean. Since the new value comes in "sum-space" already, we don't have to undo anything on it, just divde by the total count.

Application

An application of this form of incremental mean can be found in the field of reinforcement learning where a value function is averaged over many experienced rewards. In this setting, you can think of $x_k-\mu_{k-1}$ as a temporal-difference error term. Since we usually don't know the total number of experiences here, we multiply by a small learning rate rather than dividing by $k$.

danijar
  • 681
14

yep easy to do - I call it the travelling mean :)

Note you will need to keep track of the 1. 'count' of values, 2. previous 'mean' and 3. the 'new value'. Algorithm is:

in words : ('previous mean' * '(count -1)') + 'new value') / 'count'

benefit you have a running mean can do the same with 'standard deviation' just a little more complex

  • 1
    but this has the same problem as above right? Where if n becomes very large the (previous mean) * (count -1) it 'could' cause an overlfow (in computers) – Charles Harring Apr 15 '20 at 08:29
  • 1
    @CharlesHarring Exactly. If mean * count doesn't overflow your data type then a simple running total won't overflow as well, and will be more accurate to boot. – Boris B. Sep 14 '20 at 23:36
7

Here is the general solution to the problem. We start calculating average from (m+1)-th element up to (n+1)-th element of the incoming data.
Giving that:

  • $a_{n}$ is the n-th element of incoming data,
  • $a_{m}$ is the element from which we start averaging,
  • $\widehat{a_{n-m}}$ is the avarage from m-th to n-th element,
  • $\widehat{a_{n+1-(m+1)}}$ is the avarage from (m+1)-th to (n+1)-th element

So, if we initially have $a_{m}, a_{m+1}, a_{m+2}, \ldots, a_{n}$, an average of $n-m$ elements can be easily calculated. That is $$ \widehat{a_{n-m}} = \frac{a_{m}+a_{m+1}+a_{m+2}+\ldots+a_{n}}{n-m} $$

Next, when the $n+1$ elementh comes, we want to obtain average calculated of the same number of elements that is from $a_{m+1}$ to $a_{n+1}$:
$$ \widehat{a_{(n+1)-(m+1)}} = \frac{a_{m+1} + a_{m+2} + \ldots + a_{n} + a_{n+1}}{(n+1) - (m+1)} $$

Then from first equation $$ \widehat{a_{n-m}}(n-m) - a_{m} = a_{m+1}+a_{m+2}+\ldots+a_{n} $$

Substituting this to equation 2 $$ \widehat{a_{(n+1)-(m+1)}} = \frac{\widehat{a_{n-m}}(n-m) - a_{m} + a_{n+1}}{n-m} $$

In order to dynamicaly calculate average for new element we need previous average $\widehat{a_{n-m}}$, first element of the previous average $a_{m}$, number of the elements we include in the average $n-m$.
I hope you will find this solution inspiring. Please write if you find any errors in the above reasoning.

  • I have to mention, that I didn't manage to find any practical appliaction for this avarage. I thought about using it for calculating complexity-efficient avarage with big ammount of data (let sat n-m = 1000), but in the end I gave up. If you know how to store $n-m$th element (for $a_{m}$) without using $n-m$th element array, please let me know. – Michal Wolodzko Feb 10 '14 at 16:18
  • So, what happens in the case you are going one step at a time and don't want to store any past information, like you have the previous average, now you have one new element? what else you need to keep, total count of elements and the first one? – Ali Feb 10 '14 at 16:20
  • Yes. You need at least to know how many elements you include in the average, the first element in the average and the average from the previous step. Note please, that $n-m$, that is the total count of elements in the average can be a constant number. So at least one piece of information you have given "for free" - you don't have to calculate it when new data arrives. – Michal Wolodzko Feb 12 '14 at 16:09
  • @MichalWolodzko This seems like a very dumb objection, but I'm just working through the equations in my head. Let's say that for the first equation, we want the average from n = 7 to m = 5. That gives three terms in the numerator (a_5, a_6, and a_7) and a denominator of 7 - 5 = 2. This doesn't seem right - surely the denominator must be the number of samples being averaged. Am I missing something obvious? – Hydrargyrum Aug 28 '19 at 14:21
  • 3
    By the way, I have a practical application for this formula. I'm doing moving average calculations in a real-time system where I want to incrementally update the moving average as old values expire out of the averaging window, without recalculating the entire average each time a new value arrives. – Hydrargyrum Aug 28 '19 at 14:23
0

Here is a solution similar to @Michal Wolodzko but with a constant batch size $w$. I developed this because I need to take an average of a large sequence of high resolution images. With this method, I can load $w$ images at a time and update the running average of the images. This is useful for computing the a "background" state which can be used for further image analysis. The following derivation is for a sequence of numbers, but you can easily generalize it to a sequence of images with pixel intensities stored in $m \times n$ arrays.

Consider a sequence of numbers $x_0,...,x_{N-1}$.

Define the following:

  • $N$ length of the sequence
  • $w$ width or batch size
  • $N_b = \text{floor }(N/w)$ number of batches
  • $N_r = \mod(N/w)$ remaining elements that don't fit evenly into a batch
  • $\overline{S}_k = \frac{1}{w}\sum_{i=kw}^{(k+1)w-1} x_i$ average over batch $k$ where $0\leq k \leq N_b-1$
  • $\overline{S}_{r} = \frac{1}{N_r}\sum_{i=Nw}^{N-1} x_i$ average over the remaining elements
  • $\overline{S}_{0:k} = \frac{1}{k}\sum_{i=0}^{k-1} \overline{S}_k$ average of the batch averages
  • $\overline{S} = \frac{1}{N}\sum_{i=0}^{N-1}x_i$ the average over all the elements

Note that $$ \begin{align*} \overline{S}_{0:k} &= \frac{1}{k} \sum_{i=0}^{k-1} \overline{S}_k\\ &= \frac{1}{kw}\left(\overline{S}_0w + \overline{S_1}w + \cdots + \overline{S}_{k-1}w\right)\\ &= \text{average of $\{x_i\}$ up to element $kw$} \end{align*} $$ so we can get the running average $\overline{S}_{0:k}$ by just averaging the batch averages.

Now, we don't want to store all the batch averages. Instead, we want to update the running average with each newly computed batch average. Consider $$ \begin{align*} \overline{S}_{0:k+1}&=\frac{1}{k+1}\sum_{i=0}^k \overline{S}_i\\ &= \frac{1}{k+1}\left(\sum_{i=0}^{k-1} \overline{S_i} + \overline{S_k}\right)\\ &= \frac{1}{k+1}\left(\overline{S}_{0:k}k + \overline{S_k}\right)\\ &= \frac{k}{k+1}\overline{S}_{0:k} + \frac{1}{k+1}\overline{S_k} \end{align*} $$ Thus we can just update the running average by taking a weighted average of what we've computed so far. The last step is to compute update the average with the awkward number of remaining elements $N_r$. $$ \begin{align*} \overline{S} &= \frac{1}{N}\left(\overline{S}_{0:N_b-1}N_bw + \overline{S}_rN_r\right)\\ &= \frac{N_bw}{N}\overline{S}_{0:N_b-1} + \frac{N_r}{N}\overline{S}_r \end{align*} $$ Overall, this method is nice because we have no risk of overflow (computing an excessively large sum) and no need to load all the numbers (or images) into memory at once.

Here's some python code demonstrating the method.

import numpy as np
import math

replace this with a sequence of images

only need to load (w) at a time

x=np.random.rand(37)

w=5 # batch size N=len(x) # length of sequence Nb=math.floor(N/w) # number of batch averages Nr=N%w # remaining elements to average

Srun=0 # running average for k in range(0,Nb): Sk=np.mean(x[kw:(k+1)w]) # batch average Srun=Srun(k/(k+1))+Sk/(k+1) # updated running average Sr = np.mean(x[Nbw:]) # average over the remaining elements Srun=Srun(Nbw/N)+Sr*(Nr/N) # final running average

compare the running average

print('Numpy: ', x.mean()) print('Running: ', Srun)

with output

Numpy:  0.4537805990791784
Running:  0.45378059907917856
nwsteg
  • 329