If we have a sequence of several random numbers one way to predict the next ones minimizing the overall error is the average.
I was explained it to some extent here.
I don't get very well why the function that we minimize is the sum of the squares: $$ \sum_i^n (x_i - \alpha)^2 = f(\alpha) $$
I do understand that using $\sum x_i -\alpha = f(\alpha)$ wouldn't take us far though (apparently).
And is there any visual or geometric way to find the solution instead?