Variance of Variance Estimation Simulation in Matlab

Question

I am trying to verify, through numerical simulation, the expression for the variance of the variance estimation, namely:

$$ \text{Var}(s^2) = \frac{2}{n \, \sigma^4} $$

where $n$ is the number of samples, and $\sigma$ is the standard deviation of the process (assumed Gaussian), but something is wrong. (For the source of this expression, see http://www.statlect.com/variance_estimation.htm/)

The numerical simulation is quite simple, as shown in the Matlab code below:

% number of Montecarlo runs
NMC = 10000;
% number of samples at each run
n = 100;

% true standard deviation of the random process
stdan = 0.1;
stdMC = [];

% Montecarlo simulation
for j=1:NMC
    % generate n samples of the random process  
    % a white noise with stdan standard deviation and zero mean
    y = stdan*randn(n, 1);

    % estimate the standard deviation based on the n samples of this run. 
    std_i = std(y);

    % store it
    stdMC = cat(1, stdMC, std_i);
end

% compute the estimation error (difference between the estimated standard deviation at each run and the true one)
estd = stdMC - repmat(stdan, NMC, 1);

 % compute the expected variance and standard deviation of the estimator
estd_var_an = (2/n)*stdan^4
estd_std_an = sqrt(estd_var_an)

% compute the actual estimated variance of the estimations
estd_var = var(estd)
estd_std = std(estd)

But it returns

estd_var_an =
   2.0000e-06
estd_std_an =
    0.0014
estd_var =
   5.1535e-05
estd_std =
    0.0072

Showing that the simulated variance estd_var of the estimator is totally different (25 larger) than the true one estd_var_an.

See also: Variance of sample variance?

To improve chances of getting a useful answer, you should describe in terms of samples and random variables just what problem you are trying to simulate. As it is, you are asking us to (a) figure out what you are trying to do in an un-commented computer simulation and then (b) de-bug your code. Alan Turing once commented that it is possible to write a program such that no one can discern its purpose. You may have given an instance of that. Please edit your Question to make the objective clear and include comments in your program. When done, please leave me a Comment, and I'll look again. — BruceET, Sep 09 '15 at 20:58
I assumed, for a moment, that the code would be self explanatory. My mistake, sorry. I tried to build the simplest possible test of the estimatot of the variance of a Gaussian random process. — Jose Ospina, Sep 09 '15 at 21:43
Meaning of your comment about 'totally different (25 larger)', escapes me. Seems wrong. Can you express it in terms of the notation in my Answer instead of your code? — BruceET, Sep 09 '15 at 23:17

BruceET · Accepted Answer · 2015-09-09T23:07:42.517

Let $X_1, X_2, \dots, X_n$ be a random sample from $Norm(\mu, \sigma)$. As usual, define $$S^2 = \frac{\sum_{i=1}^n (X_i - \bar X)^2}{n-1}.$$ Then the sample variance $S^2$ unbiased; that is $E(S^2) = \sigma^2$. Moreover, $(n - 1)S^2/\sigma^2 \sim Chisq(df = n-1).$ Because the variance of this distribution is $2(n-1),$ we have $V(S^2) = 2\sigma^4/(n-1).$ Of course, by definition $V(S^2) = E[(S^2 - \sigma^2)^2].$

In practice, $\mu$ is rarely known. But if it is, one can estimate $\sigma^2$ with $S_\mu^2 = (1/n)\sum(X_i - \mu)^2.$ Then, similarly $V(S_\mu^2) = 2\sigma^4/n.$

I am more familiar with R, so I will use it to demonstrate the first case in which $\mu$ is unknown, and leave it to you to figure out the MatLab code and the conversion to the less commonly used case (if that is really what you intend).

Specifically, I use $n = 25,$ $\mu = 50$ and $\sigma = 10$ for the samples, and simulate $m = 100\,000$ samples. Simulated values are $E(S^2) \approx 99.98$ and $V(S^2) \approx 831,$ which are within simulation error of the exact values 100, and 833.33, respectively.

 m = 10^5;  n = 25;  mu = 50;  sg = 10
 x = rnorm(n*m, mu, sg)
 DTA = matrix(x, nrow=m)  # each row a sample of n
 v = apply(DTA, 1, var)   # vector of m sample variances (dim 1 for rows)
 mean(v)
 ## 99.98174
 var(v)
 ## 830.7297
 2*sg^4/(n-1)
 ## 833.3333
 mean((v-sg^2)^2)
 ## 830.7217

Now I see that I was mixing standard deviation and variance. Thanks for your help — Jose Ospina, Sep 10 '15 at 08:21
@bruceET : how come $(n-1)S^2/\sigma^2$, which is the sum of n terms, follows a Ki2 with n-1 degrees of freedom, and not with n degrees of freedom ? — mocquin, Sep 27 '19 at 10:08
'Intuitive' answer is that one degree of freedom is 'lost' estimating $\mu$ by $\bar X.$. Perhaps see this Q&A for more details. — BruceET, Sep 27 '19 at 16:16

score 0 · Answer 2 · answered Sep 10 '15 at 08:20

In line

estd_var = var(estd)

I was computing the variance of the standard deviation, and not the variance of the variance.

Once again a reminder that moving back and forth between variance and standard deviation leads to errors. Stick with variance.

Variance of Variance Estimation Simulation in Matlab

2 Answers2

Linked