2

Let $X$ and $Y$ be two random variables with joint distribution $P_{X,Y}$ and marginal distributions $P_X$ and $P_Y$. The Pearson correlation coefficient is defined to be $$\rho_{X,Y}=\dfrac{\mathbb{E}(XY)-\mathbb{E}(X)\mathbb{E}(Y)}{\sigma_X\sigma_Y}\tag{1}$$ where $\mathbb{E}$ means the mean value and $\sigma_X,\sigma_Y$ are the respective standard deviations.

This is meant to be a quantifier of correlation. As put in Wikipedia's page:

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data.

My question is: given this intuitive idea about correlation, what is the motivation to define (1) as a quantifier of correlation? How do we motivate definition (1)?

It is also hinted upon on the linked page that $\rho_{X,Y}$ is "Mathematically, it is defined as the quality of least squares fitting to the original data". But I still fail to see wy this would be a good quantifier of correlations.

Gold
  • 26,547
  • Usually in a definition you write the numerator as $E[(X-E[X])(Y-E[Y])]$ which is basically taking into account when $X$ and $Y$ differ from their means in the same way vs different ways. The denominator is there to scale it down to fit in $[-1,1]$ which it does in fact do by the Cauchy-Schwarz inequality. – Ian Feb 29 '20 at 15:06

2 Answers2

3

It helps to instead write an equivalent definition,$$\rho_{X,\,Y}=\frac{\Bbb E((X-\Bbb EX)(Y-\Bbb EY))}{\sigma_X\sigma_Y}.$$This is a covariance divided by a product of standard deviations. I've explained before that covariance is an inner product (with some qualifying statements you'll find at that link). Then standard deviation is like a squared length, so the above formula is like$$\cos\theta=\frac{a\cdot b}{|a||b|}.$$In particular, perfectly correlated variables are "parallel" in a vector space of random variables, whereas uncorrelated ones are orthogonal.

J.G.
  • 115,835
  • 1
    The only gap here is "why this inner product rather than some other one?". – Ian Feb 29 '20 at 15:09
  • 1
    @Ian I'll expand a point I made in a comment on the linked answer, when that arose: we regard uncorrelated (and certainly independent) variables as orthogonal for a geometric intuition about changing one variable by following a Cartesian axis, and to define an inner product we only need decide what a "length" is, so it may as well be standard deviation. – J.G. Feb 29 '20 at 15:13
  • @J.G. thanks for the answer. I think I get your point: if we see $X,Y$ are elements of a vector space and if "following the direction of $X$ does not affect $Y$" - as in the case they are orthogonal - then they are independent, because we may change one without affecting the other, right? But isn't this, in a sense, just taking into account the possibility of a linear constraint between $X$ and $Y$? Couldn't they be related in some non-linear way and therefore be correlated and not captured by $\rho_{X,Y}$? – Gold Feb 29 '20 at 15:17
  • 1
    @J.G. I do think your polarization identity point is a nice one, it makes the choice of an inner product seem significantly less arbitrary. – Ian Feb 29 '20 at 16:03
  • @user1620696 Correlation in Pearson's sense is linear correlation by definition. – Ian Feb 29 '20 at 16:03
  • @user1620696 You're right, as is Ian's clarification. Being uncorrelated proves very little. – J.G. Feb 29 '20 at 16:13
1

Initially, Pearson's correlation coefficient was introduced in the context of linear regression (e.g. Pearson, 1986): $$ Y=\alpha+\beta X+\epsilon, $$ where $\mathsf{E}[\epsilon\mid X]=0$. In this case $$ \beta=\rho_{X,Y}\times\frac{\sigma_Y}{\sigma_X}. $$ So $\rho_{X,Y}$ is a measure of linear dependence between $X$ and $Y$ and typically fails to account for nonlinear dependence, e.g. when $Y=X^2$ and the distribution of $X$ is symmetric about the origin, $\rho_{X,Y}=0$.