1

I'm trying to find the variance of a set of numbers $90,90,80,100,99,81,98,82$. When I do it manually and when I use online calculator sites I get the answer $61.25$ but when i put the data into a stat list and use the variance( button on my TI-84 I get the answer $70$. My calculator works and I'm absolutely positive I entered the list right. What's wrong?

tinlyx
  • 1,534
  • 4
    It depends on how the variance is defined. It can be either defined as:

    $$\frac 1 n \cdot \sum_{i = 1}^{n}(x_i - \mu)^2$$ or $$\frac 1 {n-1} \cdot \sum_{i = 1}^{n}(x_i - \mu)^2$$ or

    – thanasissdr Jun 30 '18 at 02:40

2 Answers2

0

There are statsticians who prefer using $n-1$ rather than $n$ when computing variance. Your sum of squared differences fromm the mean is indeed 490. Divide by either 8 or 7

https://en.wikipedia.org/wiki/Bessel%27s_correction

Will Jagy
  • 139,541
0

Many statistics and probability books make a distinction between two kinds of 'variances'.

(a) Variance of a finite population $\sigma^2 = \frac 1 N \sum_{i=1}^N (X_i - \mu)^2,$ where $\mu = \frac 1 N \sum_{i=1}^N X_i$ is the population mean and $N$ the population size. For your $N=8$ values, $\sigma^2 = 61.25.$

(b) Variance of a sample from an infinite population $S^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i = \bar X)^2,$ where $\bar X = \frac 1 n \sum_{i=1}^n X_i$ is the the sample mean and $n$ is the sample size. For your $n = 8$ values, $S^2 = 70.$

A sample variance defined with $n-1$ in the denominator has the advantage that $E(S^2) = \sigma^2.$ where the population variance $\sigma^2$ exists. Accordingly, one says that $S^2$ is an 'unbiased' estimator of $\sigma^2.$ [For further discussion and a formal proof of unbiasedness, see this page (especially answers by me and @Vivek)--among others on this site and elsewhere online.]


Some calculators give a choice whether to compute a standard deviation (square root of variance) for entered values regarded as a population or as a sample. Sometimes buttons are labeled $\sigma_n$ and $\sigma_{n-1}.$ Neither is perfect notation, but the choice is clear. (On the TI-84, the distinction is made between the sample standard deviation '$\text{Sx}$' and the population standard deviation '$\sigma\text{x}$'; see p206 of the Guide Book.)

In R statistical software, the function var implements the formula for the sample variance (denominator $n-1).$

x = c(90,90,80,100,99,81,98,82);  var(x)
## 70        # variance of vector x treated as a sample
N = length(x);  (N-1)*var(x)/N
## 61.25     # adjustment to treat vector x as a population

Note: A brief simulation in R illustrates the unbiasedness of $S^2$ as an estimator of $\sigma^2.$ A million samples of size $n = 1000$ are sampled from a normal population with mean $\mu = 50$ and standard deviation $\sigma = 5,$ variance $\sigma^2 = 25.$ The sample variance $S^2$ (denoted v in the program) is found for each sample. The average of the one million sample variances approximates $E(S^2)$ accurate to several significant digits. (Even for samples of size $n = 1000,$ there is considerable variability among the values of $S^2;$ in this particular simulation the smallest sample variance was 19.82 and the largest was 30.87. "Variances are very variable.")

set.seed(630);  m = 10^6
v = replicate(m,  var(rnorm(1000, 50, 5)))  # vector of 10^6 sample variances
mean(v)
##  25.00077  # aprx E(v) = 25.
BruceET
  • 51,500