2

Suppose $\bf{A}$ is a symmetric positive-definite matrix and now we want to maximize function $f(\bf{x})=\bf{x}^\rm{T}\bf{A}\bf{x}$ with restriction $\bf{x}^\rm{T}\bf{x}=\rm{1}$. Using Lagrange multiplier we have $L(\bf{x})=\bf{x}^\rm{T}\bf{A}\bf{x}-\lambda(\bf{x}^\rm{T}\bf{x}-\rm{1})$ and by taking derivative of both sides we get $L'(\bf{x})=(\bf{A}-\lambda\bf{I})\bf{x}=\rm{0}$, whose solutions are eigenvectors of $\bf{A}$.

My question is how to prove the solutions (the eigenvectors) are indeed maxima of $f(\bf{x})$ rather than minima. I am not sure but I think this is related to Hessian matrix and I found here Hessian matrix of a quadratic form that the Hessian matrix of a quadratic form seems to be $\bf{A}+\bf{A}^\rm{T}$, but I don't know how to use it with a restriction. I post this question to ask for help with a complete proof. Thank you.


P.S. The background of this question is a the widely-used statistical model Principal Component Analysis. A related question is Why the principal components correspond to the eigenvalues? if you are interested.

Tony
  • 5,576

1 Answers1

1

First, if $A$ is not symmetric, then you can not say that $\nabla L(x) = (A-\lambda I)x$, all you can deduce is that $\nabla L(x) = (\frac{A+A^T}{2}-\lambda I)x$. Let us call $\Re A = \frac{A+A^T}{2}$.

We have necessarily in maxima/minima the condition that $\lambda_i$ is the eigenvalue of $\Re A$ and the corresponding $x_i$ is the eigenvector of $\Re A$. We will suppose that $\lambda_1\ge \lambda_2\ge\ldots$.

Then again, $\Re A$ is a symmetric matrix, hence all its eigenvalues are real and its eigenvectors form an orthonormal basis.

Finally, we write any vector $y$ with $\|y\|=1$ as $$y=\sum_i a_ix_i.$$ The constants $a_i$ satisfy $\sum_ia_i^2=1$ and then take a look at $y^TAy$, we obtain $$y^TAy = \sum_i \lambda_i a_i^2 $$ sunject to constraints $\sum_ia_i^2=1$. This is an easy optimisation problem with maximum attained when $a_1=\pm1$.

TZakrevskiy
  • 22,980
  • Thank you for the reply. $\bf{A}$ is a symmetric matrix. – Tony Apr 03 '15 at 15:53
  • I am a bit confused. Why $y^TAy = \sum_i \lambda_i a_i^2$? – Tony Apr 03 '15 at 16:08
  • @Tony $y=\sum_i a_ix_i,$ hence $Ay =\sum_i \lambda_ia_ix_i $ because $x_i$ are eigenvectors. Then again, $x_i$ are an orthonormal basis, hence $y^TAy = (\sum_j a_jx_j,\sum_i \lambda_ia_ix_i ) =\sum_i \lambda_i a_i^2 $. – TZakrevskiy Apr 03 '15 at 16:10
  • Thank you. I have another question. What's the purpose of this step $y=\sum_i a_ix_i$? – Tony Apr 03 '15 at 16:21
  • @Tony to simplify the maximisation problem. – TZakrevskiy Apr 03 '15 at 16:22
  • I am not sure but I think the solution is showing $f(\bf{x})$ increases with the eigenvalues and their corresponding eigenvectors, but not proving they are maxima. – Tony Apr 03 '15 at 17:00