Averages: why do we minimize the error function, is it simplicity?

Question

If we have a sequence of several random numbers one way to predict the next ones minimizing the overall error is the average.

I was explained it to some extent here.

I don't get very well why the function that we minimize is the sum of the squares: $$ \sum_i^n (x_i - \alpha)^2 = f(\alpha) $$

I do understand that using $\sum x_i -\alpha = f(\alpha)$ wouldn't take us far though (apparently).

And is there any visual or geometric way to find the solution instead?

The point is that "minimizing the overall error" needs to be defined. What function exactly do you want to minimize? Least squares has some nice, analytic, properties (such as the Gauss-Markov Theorem), but there is nothing sacred about it. You could just take $\sum |x_i-\alpha|$ if you prefer (among many other choices). — lulu, Aug 08 '22 at 13:01
Minimizing the error for predicting the next values in the sequence @lulu. Does that function yield the average of X as well ? — Mah Neh, Aug 08 '22 at 13:02
Well, anything you use will have some variance to it. Gauss-Markov tells us that least squares minimizes that variance (of course there are assumptions involved). — lulu, Aug 08 '22 at 13:12
Thanks for the information, I will try to read it then. @lulu — Mah Neh, Aug 08 '22 at 13:43
@Mah Neh: The issue of what loss function to use is connected to the issue of robustness and efficiency of an estimator. For example, $\bar{x}$ is an estimator that is efficient ( the variance is low ) but it also has a lower breakdown point than say the median. The breakdown point is the percentage of "outlier observations" that need to be introduced before the estimator gets totally distorted. Notice that the median stays the same if a large outlier is introduced whereas the mean will change a lot. There is a whole literature on this ( in statistics ) which might be what you're after. — mark leeds, Aug 10 '22 at 10:18
@lulu mind you to expand in a short sentence what do you understand by variance in that sentence? — Mah Neh, Aug 10 '22 at 13:06
Any sampling method has error. Presumably, you are only interested in unbiased sampling, so the sample mean is expected to match the true mean. But that just means that the sample mean is a random variable with mean equal to the true mean, it has a variance. Often, "optimal" sampling means minimizing that variance. — lulu, Aug 10 '22 at 13:37

score 1 · Answer 1 · answered Aug 10 '22 at 07:55

I want to add to the comment made above is that the least square approximation doesn't even require the knowledge of distribution of the random variable . So, that makes him special among all the method of estimation . However there are some limitations to it as well because the no of unknowns should be equal to the no of knowns in order to use the least square estimation.

Averages: why do we minimize the error function, is it simplicity?

1 Answers1