How principal component analysis ensures component orthogonality when using zero co-variance as the restriction to maximize variance?

Question

I am currently learning the mathematics behind PCA and I found when PCA maximizes variance to find out the 2nd, 3rd, ... components, it uses zero co-variance as the restriction, as shown below,

enter image description here

However, I believe PCA is also an orthogonal transformation. I am very puzzled by how the zero co-variance restriction can ensure the orthogonality? Any hint or recommendation of reading, lecture video is welcome.

score 0 · Accepted Answer · edited Apr 13 '17 at 12:21

I finally figure this out! I post the answer and hope it might be helpful for someone else. Two other questions I asked related to this one are

By the first question, I figured out that ${\mathop{\rm cov}} ({{\bf{a}}^{\rm{T}}}{\bf{X}},{{\bf{b}}^{\rm{T}}}{\bf{X}}) = {{\bf{a}}^{\rm{T}}}\rm{var}({\bf{X}}){\bf{b}}$.

By the second question, I figured out that the the loadings of each principal component are orthogonal eigenvectors of the co-variance matrix $\rm{var}({\bf{X}})$. Note that co-variance matrix is a real-valued symmetric matrix, which must have $n$ orthogonal eigenvectors.

As a result, suppose $\bf{a}_1^\rm{T}$ and $\bf{a}_2^\rm{T}$ are loadings of two principal components, then ${\mathop{\rm cov}} ({\bf{a}}_1^{\rm{T}}{\bf{X}},{\bf{a}}_2^{\rm{T}}{\bf{X}}) = {\bf{a}}_1^{\rm{T}}\rm{var}({\bf{X}}){{\bf{a}}_2}{\rm{ = }}{\bf{a}}_1^{\rm{T}}\lambda {{\bf{a}}_2}{\rm{ = }}\lambda {\bf{a}}_1^{\rm{T}}{{\bf{a}}_2}{\rm{ = }}0$, which completes the proof.

How principal component analysis ensures component orthogonality when using zero co-variance as the restriction to maximize variance?

1 Answers1