Let $X$ and $Y$ be two random variables with joint distribution $P_{X,Y}$ and marginal distributions $P_X$ and $P_Y$. The Pearson correlation coefficient is defined to be $$\rho_{X,Y}=\dfrac{\mathbb{E}(XY)-\mathbb{E}(X)\mathbb{E}(Y)}{\sigma_X\sigma_Y}\tag{1}$$ where $\mathbb{E}$ means the mean value and $\sigma_X,\sigma_Y$ are the respective standard deviations.
This is meant to be a quantifier of correlation. As put in Wikipedia's page:
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data.
My question is: given this intuitive idea about correlation, what is the motivation to define (1) as a quantifier of correlation? How do we motivate definition (1)?
It is also hinted upon on the linked page that $\rho_{X,Y}$ is "Mathematically, it is defined as the quality of least squares fitting to the original data". But I still fail to see wy this would be a good quantifier of correlations.