I read in the following paper (https://iopscience.iop.org/article/10.1088/0026-1394/41/3/004/pdf) on page 133 "The conditional standard deviation (...) is necessarily an underestimate of its unconditional standard deviation." . I am trying to understand why this statement is true.
I will now outline my understanding of this in 3 parts.
Part 1: In this link (Proving that Sample Variance is an unbiased estimator of Population Variance), a proof is given that shows the sample variance is an unbiased estimator of the population variance:
$$E(S^2) = \frac{n-1}{n}E(X_1-Y_1)^2 = \frac{n-1}{n}\text{var}(X_1-Y_1) = \frac{n-1}{n}\left(\sigma^2 + \frac{\sigma^2}{n-1}\right) = \sigma^2$$
Part 2: In this link (https://stats.stackexchange.com/questions/496424/how-to-prove-s2-is-a-consistent-estimator-of-sigma2), a proof is given that shows the sample variance is a consistent estimator of the population variance:
\begin{align*} &\mathbb{P}(\mid s^2 - \sigma^2 \mid > \varepsilon )\\ &= \mathbb{P}(\mid s^2 - \mathbb{E}(s^2) \mid > \varepsilon )\\ &\leqslant \dfrac{\text{var}(s^2)}{\varepsilon^2}\\ &=\dfrac{1}{(n-1)^2}\cdot \text{var}\left[\sum (X_i - \overline{X})^2)\right]\\ &=\dfrac{\sigma^4}{(n-1)^2}\cdot \text{var}\left[\frac{\sum (X_i - \overline{X})^2}{\sigma^2}\right]\\ &=\dfrac{\sigma^4}{(n-1)^2}\cdot\text{var}(Z_n)\\ &=\dfrac{\sigma^4}{(n-1)^2}\cdot 2(n-1) = \dfrac{2\sigma^4}{n-1} \stackrel{n\to\infty}{\longrightarrow} 0 \end{align*}
Thus, $ \displaystyle\lim_{n\to\infty} \mathbb{P}(\mid s^2 - \sigma^2 \mid > \varepsilon ) = 0$ , i.e. $ s^2 \stackrel{\mathbb{P}}{\longrightarrow} \sigma^2 $ as $n\to\infty$ , which tells us that $s^2$ is a consistent estimator of $\sigma^2$ .
Part 3: Using some algebraic manipulation, I can see that the sample variance appears to always be less than the population variance:
The sample variance is defined as:
$$s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2$$
where $n$ is the sample size, $x_i$ are the individual observations, and $\bar{x}$ is the sample mean and the population variance is denoted as $\sigma^2$.
OLS (Ordinary Least Squares) tell us that $\bar{x}$ minimizes the sum of squared deviations - thus:
$$\sum_{i=1}^n (x_i - \bar{x})^2 \leq \sum_{i=1}^n (x_i - \mu)^2$$
Dividing both sides by $n-1$:
$$\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \leq \frac{1}{n-1} \sum_{i=1}^n (x_i - \mu)^2$$
Further simplifying and substituting (for large enough $n$):
$$\sigma^2 = \frac{1}{n} \sum_{i=1}^n (x_i - \mu)^2$$
$$\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \leq \frac{1}{n-1} n\sigma^2 = \sigma^2$$
$$s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2 \leq \sigma^2$$
This proves that the sample variance $s^2$ is always less than or equal to the population variance $\sigma^2$. On another note, using informal logic, an argument can be made that the sample variance might not include extreme outliers, whereas the population would include these extreme outliers. Extreme outliers have large deviations from the mean - thus, the presence of extreme outliers would increase the variance calculations. Thus, the sample variance can technically never be smaller than the population variance.
My Question: How can all 3 parts simultaneously be correct at the same time? If the sample variance is said to estimate the population variance without any bias, the sample variance is said to converge to the population variance for large samples - then how can the sample variance always be guaranteed to be less than the population variance? Is this not a contradiction?
Thanks!
References: