3-sigma Ellipse, why axis length scales with square root of eigenvalues of covariance-matrix

Question

This is my first post on math.stackexchange and i am not a mathematician, but i took some undergrad math courses and some grad mathematical modelling courses, so i come with a basic understanding of math and computed a lot of eigenvectors and eigenvalues already... I hope my notation can be understood easily.

I am trying to describe a two dimensional scatter plot by 3 parameter, which define the shape of an ellipse (i.e. length of major-axis, length of minor-axis and the angle between the major-axis and the ordinate of a cartesian coordinate system. The reason for this is, the scatter-plot changes over time and i want to track that change by tracking the change of those 3 parameters, i.e. the shape of the ellipse. I do not need a confidence-region or anything like this, just the parameters of an ellipse, fitting to my data.

All i have given (and need) is the x- and y-coordinates of the scatter plot. I understand that the eigenvectors of the covariance-matrix give me orientation of the major- and minor-axis, which gives me the angle between major-axis and ordinate (just as in a principal component analysis). What I do not understand is, how do I get the length of the major- and minor-axis from the covariance matrix. I found two different approaches:

Similar post on math.stackexchange linking to this source code on Wikipedia. They do a Cholesky-Decomposition of the Covariance matrix and transform a unit-circle with it. I cannot follow this mathematically, nor do i know why this would be correct.
Similar post on stackoverflow They define the width and the height (i.e. length of the major- and minor-axis) as 2 * nstd * np.sqrt(vals), where nstd is denotes the n-th standard deviation which. In my case this would be 3.

I can follow the second approach better, used the python code and the result looks great! But i do not understand why the length of the axis scales with the square root of the eigenvalues. I think that there must be a simple relation between the eigenvalue of a covariance matrix and the standard deviation of the projected data to the ellipse-axis, which i just cannot find in the literature and which i cannot derive myself. Can anyone enlighten me on this or help me to find something to read, such i can understand this better.

Best, Nras.

Update: I got the Textbook "Circular Statistics in Biology" from Batchelet, 1981. On pp.268-274 they calculate the semi-axis and the inclination angle for an example with $\sigma_1=3$ and $\sigma_2=2$ and with the correlation coefficient $\rho=0.4$. This yields the covariance matrix $$\begin{align*} Cov & =\left[ {\begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma_2^2\\ \end{array} } \right] \\ &= \left[ {\begin{array}{cc} 9.0 & 2.4\\ 2.4 & 4.0\\ \end{array} } \right] \end{align*}$$ and go through the equations 14.4.8 to 14.4.11 and find the results $$a=3.157, b=1.742, \Theta=21.9^\circ$$ The result just magically appears, as i cannot find the referenced equations, they stop after 14.4.7, right before this example and the chapter finishes after the example. What a pity!

Running the Code in python, where the semi-axis length is defined by the square root of the eigenvalue, i obtain the same result:

import numpy
cov = numpy.array([[9., 2.4], [2.4, 4.]])
eigenvals, eigenvecs = numpy.linalg.eig(cov)
numpy.sqrt(eigenvals)
numpy.rad2deg(numpy.arccos(eigenvecs[0, 0]))

The output is as follows:

array([ 3.1568251,  1.7419688])
21.915430336046288

So there must be something that explains why the ellipse semi-axis length scales with the square root of the eigenvalue. I now know that it is true, but still do not know why.

Update2: The referenced equations really do not exist, but the calculation is sketched in the previous chapter 13.8. Summary: Let the Distributions 1 and 2 be centered at $\mu_1$ and $\mu_2$, respectively and and let $$ A=\sigma_2^2,\quad B=-\rho\sigma_1\sigma_2,\quad C=\sigma_1^2,\quad D=(1-\rho^2)\sigma_1^2\sigma_2^2$$ then the Ellipse follows the equation $$ A(x-\mu_1)^2 + 2B(x-\mu_1)(y-\mu_2) + C(y-\mu_2)^2 = D$$ From that equation the semi-axis length and inclination of the major-axis is computed as follows: Let $$R=[(A-C)^2 + 4B^2]^{1/2}$$ then the semi-major-axis $a$ and the semi-minor-axis $b$ can be computed as $$b, a = \left(\dfrac{2D}{A+C\pm R}\right)^{1/2}$$ and for the inclination $$\Theta = \arctan\left(\dfrac{2B}{A-C-R}\right)$$ Substituting in the values from the example yields the exact same solution as the approach with the square root of the eigenvalues and the inclination which can be obtained from the arccos of the eigenvector.

From here it cannot be fare to show that $a=\sqrt{\lambda_1}$ and $b=\sqrt{\lambda_2}$ if $\lambda_i$ are the eigenvalues of the covariance matrix.

The key here is that the eigenvectors returned by numpy.linalg.eig have unit length. Therefore, the eigenvalues directly relate to the variances (and standard deviation is the square root of the variance). It's easiest to think about the uncorrelated case at first. In that case, the eigenvectors are parallel to the coordinate system, and the eigenvalues are just the variances of the data (and the square root of those is the standard deviation). (Of course, relating the standard deviation to the confidence interval assumes a normal distribution, but that's a separate issue.) — Joe Kington, Aug 28 '14 at 15:04
@JoeKington Thank you for your answer, but the variance of the two distributions itself are on the diagonal of the covariance-matrix. So why are the eigenvalues then related directly to the variance? Then the eigenvalues should be exactly sigma_1 or sigma_2. But computing the eigenvalues symbolically does not return that result: http://www.wolframalpha.com/input/?i=eigenvalues+{{a^2%2Crab}%2C{rab%2Cb^2}} I am still a bit lost, but almost ready to accept that. — Nras, Aug 28 '14 at 16:00
It's easiest to consider the case of no correlation first. In that case, the eigenvalues of the covariance matrix are the diagonal of the covariance matrix. Now consider the case where they they co-vary. Taking the eigenvectors finds a space (i.e. rotation) where the the off-diagonal terms of the matrix are zero. In other words, if you were to rotate your data so that the axes aligned with the eigenvectors of the covariance matrix, the "new" x and y errors would be completely independent. — Joe Kington, Aug 28 '14 at 16:16
(continued..) The eigenvalues of the covariance matrix are the diagonal of that rotated covariance matrix (by definition). Therefore, they represent the variances along a direction such that there's no correlation between your variables. Does that help any? (I should probably just go ahead and write an answer... A few diagrams would help.) — Joe Kington, Aug 28 '14 at 16:20
I think i am getting the idea. It is not the original variance with respect to the cartesian coordinate system but it is more like the new variance with respect to the coordinate system defined by the eigenvectors. Just as if i would compute the deviation between the points and the x-axis, i now compute the deviation between the points and the main-eigenvector, which can be significally smaller (considering a straight line with a slope other than 0 for instance, it will have 0 variance with respect to the new axis). Is that the correct way to think about it? And yes, your comments help :-). — Nras, Aug 28 '14 at 16:38
(continued) but that imagination i had in mind already. Just that the length of the axis is then defined by the magnitude of the new variance, with repsect to the new coordinate system. That would be a good explanation for me. — Nras, Aug 28 '14 at 16:40

score 4 · Answer 1 · answered Aug 31 '14 at 09:16

If anyone is still interested in this problem, in my spare time i think i found an answer. The answer is: the two methods only yield the same lengths for the semi-minor and semi-major-axis of the ellipse for the theoretical case that the covariance matrix is exactly equal to the expectation value of the covariance matrix for a bivariate gaussian distribution. $$Cov = \left[{\begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma_2^2\\ \end{array} }\right]$$ If the covariance-matrix comes from a numerical realization of a bivariate gaussian distribution, then the covariance matrix will not exactly look like stated above and the results of the "eigenvalue"-method and the "Batschelet, 1981"-Method begin to deviate. The result can be seen in this python code below

import numpy
import scipy

# example 14.4.1 (p. 273), parameter for bivariate gaussian distribution
mu_1 = 5
mu_2 = 4
sigma_1 = 3.0
sigma_2 = 2.0
rho = 0.4   # correlation

# eq 14.4.3 (p. 269), expectation value for covariance-matrix for bivariate gaussian distribution!
cov_theo = numpy.array([[sigma_1**2, rho*sigma_1*sigma_2],
                   [rho*sigma_1*sigma_2, sigma_2**2]])

# make an example (with 10000 points) as sketched in Batschelet, 1981
points = numpy.random.multivariate_normal(mean=(mu_1,mu_2), cov=cov_theo, size=10000)
x, y = points.T
cov = numpy.cov(x, y)   # covariance-matrix by the realization

# get estimations for mu, sigma, rho
mu_1 = numpy.mean(x)
mu_2 = numpy.mean(y)
sigma_1 = numpy.std(x)
sigma_2 = numpy.std(y)
rho = scipy.corrcoef(x,y)[0,1]

## -- BATSCHELET-METHOD (1981) -- ##
# eq 14.4. (ellipse equation): A(x-mu_1)^2 + 2B(x-mu_1)(y-mu_2) + C(y-mu_2)^2 = D
A = sigma_2**2
B = -rho*sigma_1*sigma_2
C = sigma_1**2
D = (1-rho**2)*sigma_1**2*sigma_2**2

# further calculations (p. 264)
R = numpy.sqrt((A-C)**2 + 4*B**2)
a = numpy.sqrt(2*D/(A+C-R))
b = numpy.sqrt(2*D/(A+C+R))

## -- EIGENVALUE-METHOD from SO -- ##
eigval, _ = numpy.linalg.eig(cov)
a_eig, b_eig = numpy.sqrt(eigval)

print 'realization of covariance_matrix\n', cov
print 'deviations from what it should look like:'
print 'sigma_1^2 - cov[0,0] = {d1}'.format(d1=sigma_1**2-cov[0,0])
print 'sigma_2^2 - cov[1,1] = {d2}'.format(d2=sigma_2**2-cov[1,1])
print 'rho*sigma_1*sigma_2 - cov[0,1] = {d}'.format(d=rho*sigma_1*sigma_2 - cov[0,1])
print 'a-a_eig = {diff_a}, b-b_eig = {diff_b}'.format(diff_a=a-a_eig, diff_b=b-b_eig)

The result looks like:

realization of covariance_matrix
[[ 9.00812576  2.4676689 ]
 [ 2.4676689   4.14435948]]
deviations from what it should look like:
sigma_1^2 - cov[0,0] = -0.00090081257572
sigma_2^2 - cov[1,1] = -0.00041443594839
rho*sigma_1*sigma_2 - cov[0,1] = -0.000246766890107
a-a_eig = -0.000158440395525, b-b_eig = -8.82014644565e-05

and it gets worse for smaller sample sizes.

In Summary:

The methods only give the same results, if the covariance matrix is exactly as assumed.
The sample points must be drawn from a bivariate gaussian distribution. If the points are not taken from such a distribution, it gets even worse.
I am still unable to show, that $a=\sqrt{\lambda_1}\quad,\quad b=\sqrt{\lambda_2}$ iff the covariance-matrix is exactly as assumed.

3-sigma Ellipse, why axis length scales with square root of eigenvalues of covariance-matrix

1 Answers1

Linked