32

The companion matrix of a monic polynomial $f \in \mathbb F\left[x\right]$ in $1$ variable $x$ over a field $\mathbb F$ plays an important role in understanding the structure of finite dimensional $\mathbb F[x]$-modules.

It is an important fact that the characteristic polynomial and the minimal polynomial of $C(f)$ are both equal to $f$. This can be seen quite easily by induction on the degree of $f$.

Does anyone know a different proof of this fact? I would love to see a graph theoretic proof or a non inductive algebraic proof, but I would be happy with anything that makes it seem like more than a coincidence!

DBr
  • 4,780

5 Answers5

25

The fact that the minimal polynomial of a companion matrix $C(f)$ is $f$ is obvious, as has been indicated above. The fact that its characteristic polynomial is also $f$ is a classical computation exercise. The computation is to be preferred over applying Cayley-Hamilton because this fact can be used as an ingredient to an elementary proof of that theorem (at least over fields) as has been said above. I will give a simpler argument below that requires no modules over a PID.

First the computation of the characteristic polynomial $$\left|\matrix{x&0&0&\ldots&a_0\\ -1&x&0&\ldots&a_1\\ 0&-1&x&\ldots&a_2\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0 & \cdots & 0 & -1 & x+a_{n-1}}\right| . $$ One way is to add the last row $x$ times to the previous row, then that row $x$ times to the one before and so on up to the first row, which results in a determinant of the form $$\left|\matrix{0&0&0&\ldots&f\\ -1&0&0&\ldots&*\\ 0&-1&0&\ldots&*\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0 & \cdots & 0 & -1 & *}~\right| = f $$ where the polynomial $f$ at the upper right is in fact obtained as in a Horner scheme $f=a_0+x(a_1+x(\cdots(a_{n-2}+x(a_{n-1}+x))\cdots))$.

Another method is to develop the matrix by the first row, and apply induction on the size. The minor that the $x$ is multiplied by is again a companion matrix, but for the polynomial $(f-a_0)/x=a_1+a_2x+\cdots+a_{n-1}x^{n-2}+x^{n-1}$, and the coefficient $a_0$ gets multiplied by $(-1)^{n-1}$ times the determinant of an upper triangular matrix of size $n-1$ with all diagonal entries $-1$, which gives $a_0$; the starting case, the matrix of this type for the polynomial $a+x$, is a $1\times1$ matrix with $x+a$ as coefficient. Again the polynomial is found as in a Horner scheme.

Yet another way is to write the determinant as $$ x^n+\left|\matrix{x&0&0&\ldots&a_0\\ -1&x&0&\ldots&a_1\\ 0&-1&x&\ldots&a_2\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0 & \cdots & 0 & -1 & a_{n-1}}\right| $$ and develop by the last column, observing that the cofactor by which the entry $a_k$ is multiplied is $(-1)^{n-1-k}$ times a minor that has a block decomposition $M=\left|{L\atop0}~{0\atop{U}}\right|$ where $L$ is a lower triangular matrix of size $k$ with entries $x$ on the diagonal, and $U$ is an upper triangular matrix of size $n-1-k$ with entries $-1$ on the diagonal, making the cofactor $x^k$, and the characteristic polynomial $f$.

Now the elementary proof of the Cayley-Hamilton theorem. Proceed by induction on $n$, the case $n=0$ being trivial. For $n>0$ take a nonzero vector $v$, and let $V$ be the subspace generated by its repeated images under the linear transformation $\phi$, which has a basis $v,\phi(v),\ldots,\phi^{d-1}(v)$ where $d=\dim(V)>0$ is the degree of the minimal polynomial $P$ that annihilates $v$ when acting by $\phi$. Extend to a basis of the whole space, in which basis $\phi$ has a matrix of the form $M=\left({A\atop0}~{{*}\atop{B}}\right)$, where $A$ is the companion matrix of $P$.

One has $\chi_M=\chi_A\chi_B$, where $\chi_A=P$, by the computation above. Now one gets zero matrices when evaluating $P$ in $A$ (because $P$ is its minimal polynomial) and (by induction) when evaluating $\chi_B$ in $B$. Thus evaluating $\chi_M=P.\chi_B$ in $M$ gives a matrix product that in block form is $\left({0\atop0}~{{*}\atop{*}}\right)\cdot\left({{*}\atop0}~{{*}\atop0}\right) =\left({0\atop0}~{0\atop0}\right)$. Note that one cannot use the induction hypothesis for $A$: one might have $d=n$, in which case $A$ is no smaller than the case currently being proved (in fact this will be the case for "generic" choices of $M$ and $v$). Therefore treating the companion matrix case explicitly is really necessary in this line of reasoning.

21

Suppose your matrix is over a field $\mathbb{F}$. Look at $G = \mathbb F[x]/f$, where $f$ is your polynomial of degree $n$. Then $G$ is a vector space over $\mathbb{F}$, and $C(f)$ is the matrix (with respect to the basis $1,x,x^2,\ldots,x^{n-1}$) corresponding to the linear operator $g \mapsto x \cdot g$.

Since $f = 0$ in $G$, also $fx^i = 0$ in $G$, and so $f$ is a polynomial of degree $n$ such that $f(C(f)) = 0$. Moreover, any polynomial $g$ of smaller degree does not reduce to $0$ in $G$, so in particular $g(C(f))$ applied to the vector $1$ does not equal the zero vector. So $f$ is the minimal polynomial of $C(f)$. Since it has degree $n$, it must be the characteristic polynomial.

azimut
  • 22,696
Yuval Filmus
  • 57,157
  • 1
    Shouldn't your «thus it must be the characteristic polynomial of $C(f)$» be «thus $f$ is divisible by the minimal polynomial of $C(f)$»? – Mariano Suárez-Álvarez Nov 14 '10 at 23:12
  • @Mariano: both polynomials have the same degree. – Yuval Filmus Nov 15 '10 at 03:07
  • 8
    @YuvalFilmus: Mariano is right, it is strange to conclude that the polynomial annihilating $C(f)$ must be its characteristic polynomial, and later conclude that it must also be the minimal polynomial because that has degree $n$ too. The proper reasoning is: $f$ is the minimal polynomial because it annihilates and no lower degree polynomial does so; given this it follows by Cayley-Hamilton that it is also the characteristic polynomial. – Marc van Leeuwen Jan 31 '13 at 08:55
  • @azimut Okay, I corrected the answer along the lines of Marc's comment. – Yuval Filmus May 22 '15 at 21:51
  • Can someone explain the part: $C(f)$ is the matrix corresponding to the linear operator g↦x⋅g ? – Mathemphetamine Mar 29 '17 at 20:04
  • Denote $T:G\to G$ defined by $g\mapsto x\cdot g$. Then $[T]_\beta = C(f)$ in Friedberg's Notation. Here $\beta = {1,x,\cdots,x^{n-1}}$. – user74489 Apr 27 '18 at 10:36
12

This is essentially Yuval's answer expressed in a slightly different way. Let your companion matrix be $$C=\pmatrix{0&1&0&\cdots&0\\\\ 0&0&1&\cdots&0\\\\ \vdots&\vdots&\vdots&\ddots&\vdots\\\\ 0&0&0&\cdots&1\\\\ -a_0&-a_1&-a_2&\cdots&-a_{n-1}}.$$ Then for the vector $v=(1\,\,0\,\,0\cdots 0)$, $$v\sum_{j=0}^{n-1} b_j C^j= \pmatrix{b_0&b_1&b_2&\cdots&b_{n-1}}$$ so that $g(C)\ne0$ for all nonzero polynomials $g$ of degree less than $n$. So the minimal polynomial has degree $n$, and equals the characteristic polynomial (via Cayley-Hamilton). But $vC^n=(-a_0\,\, {-a_1}\,\, {-a_2}\cdots{-a_{n-1}})$ and for $v(C^n+\sum_{j=0}^{n-1}b_j C^j)=0$ we need $a_j=b_j$. So the minimal and characteristic polynomials both equal $f$.

Robin Chapman
  • 22,310
10

Surprisingly, the following (in my opinion) quite elegant proof is still missing:

Look at the $F$-vector space $F[x]/(f)$. The map $$\phi : F[x]/(f)\to F[x]/(f),\quad g + (f)\mapsto x\cdot g + (f)$$ is well-defined and $F$-linear.

Let $m_\phi = \sum_{i=0}^d a_i x^i\in F[x]$ be the minimal polynomial and $\chi_\phi\in F[x]$ the characteristic polynomial of $\phi$. Then $m_\phi(\phi)$ is the zero map in $\operatorname {End}(F[x]/(f))$. Thus $$0 + (f) = m_\phi(\phi)(1 + (f)) = \sum_{i=0}^d a_i \phi^i(1 + (f)) = \left(\sum_{i=0}^d a_i x^i\right) + (f) = m_\phi + (f).$$

So $$f\mid m_\phi \mid \chi_{\phi},$$ where the scecond divisibility follows from Cayley-Hamilton. Because of $m_\phi \neq 0$ and $\deg(f) = \dim_F(K[x]/(f)) = \deg(\chi_\phi)$ and because all the polynomials are monic, this forces $$ f = m_\phi = \chi_\phi.$$

With respect to the basis $(1 + (f), x + (f),\ldots, x^{n-1} + (f))$, the transformation matrix of $\phi$ is the companion matrix $C(f)$ of $f$. So the minimal polynomial of $C(f)$ equals $m_\phi$ and the characteristic polynomial of $C(f)$ equals $\chi_\phi$.

azimut
  • 22,696
2

I have been thinking about the the problem a bit today. What Robin and Yuval have both shown is that if the Cayley-Hamilton theorem is true, then the characteristic and minimal polynomial of $C(f)$ are both equal to $f$ .

Conversely, assume that for all $f \in F[x]$, the the characteristic and minimal polynomial of $C(f)$ are both equal to $f$ .

Let $V$ be a finite dimensional $F$-vector space and $T : V \to V$ a linear transformation. we know from the classification theorem of modules over PIDs that there exists a basis $B$ of $V$ and $f_1, \dots, f_s \in F[x]$ such that $f_1 \mid \dots \mid f_s$ and

$$ [T]_B = \begin{pmatrix} C(f_1) & & \\ & \ddots & \\ & & C(f_s) \\ \end{pmatrix} := M$$

It is clear that the characteristic polynomial of $M$ is the product of the characteristic polynomials of all of the $C(f_i)$s and the minimal polynomial of $M$ is the least common multiple of the minimal polynomials of all the $C(f_i)$s. We see form the assumption that the characteristic polynomial of $T$ is $f_1 f_2 \dots f_s$ and the minimal polynomial of $T$ is $f_s$. This proves the Cayley-Hamilton theorem.

This shows that the Cayley-Hamilton theorem is equivalent to the fact "for all $f \in F[x]$, the the characteristic and minimal polynomial of $C(f)$ are both equal to $f$".

Proving the Cayley-Hamilton theorem without assuming knowledge of modules over PIDs or companion matrices is quite delicate (from what I remember from first year university).

This seems to support the idea that in order prove to the Cayley-Hamilton theorem (or the fact about companion matrices), you need to at some point get your hands dirty (whether it be directly computing the minimal and characteristic polynomials of a companion matrix or going through a delicate proof of the Cayley-Hamilton theorem).

DBr
  • 4,780
  • There are several nice ways to prove Cayley-Hamilton without doing anything delicate. Since it's a polynomial identity, it suffices to prove it over some field of characteristic zero. Over C it's obvious because it's obvious for diagonalizable matrices, and those are dense. Alternately, one can work "universally," i.e. in Z[x_{ij}]. This method is demonstrated here: http://mathoverflow.net/questions/32133/expressing-adja-as-a-polynomial-in-a/32303#32303 – Qiaochu Yuan Nov 14 '10 at 22:51
  • I still have high hopes for someone giving us a combinatorial proof though :D – DBr Nov 14 '10 at 22:52
  • @Qiaochu: I suppose my last comment was a bit much! Thanks for the argument and link – DBr Nov 14 '10 at 23:01
  • @DBr: I am skeptical about a combinatorial proof. Combinatorics is good at proving identities but to prove that the minimal polynomial is f one has to prove a non-identity (that is, that no smaller polynomial works) which seems much less amenable to combinatorial techniques. Of course, I could be wrong. – Qiaochu Yuan Nov 14 '10 at 23:19
  • @Qiaochu: That's easy, just take $e_1$ and see what happens to it under $n-1$-degree polynomials. – Yuval Filmus Nov 15 '10 at 03:08
  • @DBr: Cayley and Hamilton are two different people. – Yuval Filmus Nov 15 '10 at 03:09
  • @Yuval Filmus: can you elaborate? – Qiaochu Yuan Nov 15 '10 at 09:30
  • 1
    @Qiaochu: Let's renumber the basis $e_0,\ldots$. Then for $k < n$, $A^k e_0 = e_k$. So the only polynomial $P$ of degree less than $n$ such that $P(A)e_0 = 0$ is $P = 0$. – Yuval Filmus Nov 15 '10 at 20:04
  • @Yuval Filmus: yes, I know. I wasn't saying that this is hard, I was saying it's unclear what good combinatorics would do in proving it. – Qiaochu Yuan Nov 15 '10 at 20:59
  • 1
    For proving the Cayley-Hamilton theorem using the computation that $f$ is the characteristic polynomial of $C_f$, one doesn't really need the decomposition into a direct sum of cyclic modules, and therefore the theory of mudules over a PID (although knowing such a decomposition exist does clarify the situation). In fact the more trivial fact that there is a composition series with cyclic modules as factors suffices. See my answer. – Marc van Leeuwen Feb 01 '13 at 08:18
  • 1
    Combinatorial proof of Cayley-Hamilton theorem is given in: http://scholarworks.sjsu.edu/cgi/viewcontent.cgi?article=4023&context=etd_theses – Sungjin Kim Sep 28 '13 at 06:49