4

Let $p(\lambda)=det(A-\lambda l)$ be the characteristic polynomial of a $n \times n$ matrix $A$. Then $p(A)=O.$

Let $p(\lambda)=p_{0}+p_{1}\lambda+\ldots+p_{n-1}\lambda^{n-1}+p_{n}\lambda^{n}$.
Let $Q(\lambda)$ be the adjugate matrix of the square matrix $A-\lambda I$, which may be considered as a polynomial in $\lambda$ and with matrix coefficients (see this for 2 by 2 and 3 by 3 cases):

$Q(\lambda)=Q_{0}+\lambda Q_{1}+\ldots+\lambda^{q-1}Q_{q-1}+\lambda^{q}Q_{q}$, where $Q_{q}$ are constant matrices.

On one hand, by virtue of Cramer's Rule (Lay P179 Thm 8), $(adjA)A=(detA)I$.
So for $A-\lambda I$ in place of A: $( \, adj (A-\lambda l) \, )(A-\lambda l) =det (A-\lambda l )I$ $\implies Q(\lambda)(A-\lambda l)=p(\lambda)I=p_{0}I+p_{1}\lambda I+\ldots+p_{n-1}\lambda^{n-1}I+p_{n}\lambda^{n}I.$

$1.$ What's the proof strategy? How would one determine/divine/previse the key steps, such as considering the adjugate of $A-\lambda I$, Cramer's Rule, and writing $Q(\lambda)(A-\lambda I)$ in two ways?

On the other hand, $Q(\lambda)(A-\lambda I)=Q_{0}A+\lambda(Q_{1}A-Q_{0})+\ldots+\lambda^{q}(Q_{q}A-Q_{q-1})-\lambda^{q+1}Q_{q}. $ Equate the two expressions for $Q(\lambda)(A-\lambda I)$ wrt powers of $\lambda$ as follows. Thus $q=n-1$.

$2.$ Why $q = n - 1$?

To save space, I (also concurrently) multiply the following by different powers of $A$ in orange: $Q_{0}A=p_{0}I,$
$\color{orangered}{A(} Q_{1}A-Q_{0}=p_{1}I \color{orangered}{)} ,$
$...,$
$\color{orangered}{A^{n-1}(} Q_{n-1}A-Q_{n-2}=p_{n-1}I \color{orangered}{)} ,$
$\color{orangered}{A^{n}(} -Q_{n-1}=p_{n}I \color{orangered}{)}.$

$3.$ What's the proof strategy, regarding these multiplications of $A^i$ for all $1 \le i \le n$ in orange?

Add all the equalities together: $RHS = p(A)=p_{0}I+p_{1}A+\ldots+p_{n-1}A^{n-1}+p_{n}A^{n} \\ = LHS =O = \text{ zero matrix }. $

Addendum: I picked this as it looks like the easiest. Please advise if others are even easier.

  • Read here for a nice explanation of the use of the adjugate matrices in two similar proofs (and also check the bogus proof):http://en.wikipedia.org/wiki/Cayley%E2%80%93Hamilton_theorem#A_bogus_.22proof.22:p.28A.29.3D_det.28AIn.C2.A0.E2.88.92.C2.A0A.29_.3D_det.28A.C2.A0.E2.88.92.C2.A0A.29_.3D_0 – DonAntonio Apr 24 '14 at 11:26

1 Answers1

1

Background: Let $A$ be an $N\times N$ matrix over a field $F$, and suppose $P$, $Q$ are polynomials with $N\times N$ matrix coefficients $P_{j}$, $Q_{j}$, respectively. That is, $$ P(\lambda) = \sum_{j=0}^{J}\lambda^{j}P_{j},\;\;\;Q(\lambda)=\sum_{k=0}^{K}\lambda^{k}Q_{k} $$ The problem which severely limits the usefulness of matrix polynomials is that (a) evaluation of such polynomials at a matrix $A$ is not unique, and (b) evaluation does not necessarily preserve factorings. For example, consider left and right evaluations of $P$ at $A$ (other evaluations are possible): $$ P_{l}(A)=\sum_{j=0}^{J}A^{j}P_{j},\;\;\; P_{r}(A)=\sum_{j=0}^{J}P_{j}A^{j}. $$ $PQ$ is a well-defined matrix polynomial $$ (PQ)(\lambda) = \sum_{j=0}^{J}\sum_{k=0}^{K}\lambda^{j+k}P_{j}Q_{k}. $$ Notice, however, that the following are not necessarily equal: $$ (PQ)_{r}(A)=\sum_{j=0}^{J}\sum_{k=0}^{K}P_{j}Q_{k}A^{j+k},\;\;\; P_{r}(A)Q_{r}(A)=\sum_{j=0}^{J}\sum_{k=0}^{K}P_{j}A^{j}Q_{k}A^{k} $$ Polynomial factorings are not generally preserved by left or right evaluation, or by any other evaluation. However, there are special cases where evaluation preserves factorings.

Lemma [Matrix Polynomial Factoring]: Let $P$ and $Q$ be matrix polynomials whose coefficients are $N\times N$ matrices with entries in a field $F$. If an $N\times N$ matrix $A$ over $F$ commutes with $Q_{r}(A)$, then right evaluation satisfies $$ (PQ)_{r}(A)=P_{r}(A)Q_{r}(A). $$ Similarly, if $A$ commutes with $P_{l}(A)$, then left evaluation satisfies $$ (PQ)_{l}(A)=P_{l}(A)Q_{l}(A), $$

This simple special case lemma is enough to give you the Cayley-Hamilton Theorem. Indeed, if $A$ is an $N\times N$ matrix over $F$, then $$ A\mbox{adj}(A)=\mbox{adj}(A)A=\mbox{det}(A)I, $$ where $\mbox{adj}(A)$ is the adjunct matrix of cofactors. In particular, replacing $A$ by $\lambda I-A$ gives $$ \mbox{adj}(\lambda I-A)(\lambda I-A)=p(\lambda)I, $$ where $p$ is the characteristic polynomial of $A$. It is easy to see that $$ P(\lambda)=\mbox{adj}(A-\lambda I)=P_{0}I+\lambda P_{1}+\cdots+\lambda^{n-1}P_{n-1},\;\; Q(\lambda)=\lambda I-A $$ are matrix polynomials such that $A$ commutes with $Q_{r}(A)=0$. So, by the factoring lemma, $$ p(A)=(pI)_{r}(A)=(PQ)_{r}(A)=P_{r}(A)Q_{r}(A)=P_{r}(A)0 = 0, $$ which is a statement of the Cayley-Hamilton Theorem.

Disintegrating By Parts
  • 87,459
  • 5
  • 65
  • 149