I was going through the definition and meaning of variance and covariance. The resources which I have, have only definition and formula without any insight.
For variance, I wrote the formula and asked myself what this formula tells me. I figured out that variance has do to something with the mean of spread. The term in the formula of variance, $(x_i-\bar x)^2$, takes the square of the distance between the $i^{th}$ observation and mean.
Now moving further if we get data on two attributes, for this we can plot a scatter diagram. I compared this situation with the notion of centre of mass in 2D. The mean of the scatter plot will be $(\bar x,\bar y)$. Covariance is expanding the idea of variance to higher dimensions(this was given in book.).
$$\mathbb{Cov}(X,Y)=\mathbb{E}[(x_i-\bar x)(y_i-\bar y)]$$
Above is the formula of covariance. I could not understand why this particular formula?
Second, if we have a point $(x_i,y_i)$ in scatter plot and we have our mean as $(\bar x,\bar y)$ then the square of distance between them will be $(x_i-\bar x)^2+(y_i-\bar y)^2$. So I thought that if we are generalising the concept of variance in two dimensions then we should have the formula of variance as:
$$\mathbb{Cov(X,Y)}=\mathbb{E}[(x_i-\bar x)^2+(y_i-\bar y)^2]=\mathbb{Var(X)}+\mathbb{Var(Y)}$$
Summary of my problem
In the formula of variance we have "square of distance between ith observation and mean" and covariance is same as variance but for two or more dimension then why we are not using "square of distance between the ith observation and mean"? Why we are using something else and in particular why that formula?
Please help I am struggling to digest it. Please.