The fact that the minimal polynomial of a companion matrix $C(f)$ is $f$ is obvious, as has been indicated above. The fact that its characteristic polynomial is also $f$ is a classical computation exercise. The computation is to be preferred over applying Cayley-Hamilton because this fact can be used as an ingredient to an elementary proof of that theorem (at least over fields) as has been said above. I will give a simpler argument below that requires no modules over a PID.
First the computation of the characteristic polynomial
$$\left|\matrix{x&0&0&\ldots&a_0\\
-1&x&0&\ldots&a_1\\
0&-1&x&\ldots&a_2\\
\vdots&\ddots&\ddots&\ddots&\vdots\\
0 & \cdots & 0 & -1 & x+a_{n-1}}\right|
.
$$
One way is to add the last row $x$ times to the previous row, then that row $x$ times to the one before and so on up to the first row, which results in a determinant of the form
$$\left|\matrix{0&0&0&\ldots&f\\
-1&0&0&\ldots&*\\
0&-1&0&\ldots&*\\
\vdots&\ddots&\ddots&\ddots&\vdots\\
0 & \cdots & 0 & -1 & *}~\right| = f
$$
where the polynomial $f$ at the upper right is in fact obtained as in a Horner scheme $f=a_0+x(a_1+x(\cdots(a_{n-2}+x(a_{n-1}+x))\cdots))$.
Another method is to develop the matrix by the first row, and apply induction on the size. The minor that the $x$ is multiplied by is again a companion matrix, but for the polynomial $(f-a_0)/x=a_1+a_2x+\cdots+a_{n-1}x^{n-2}+x^{n-1}$, and the coefficient $a_0$ gets multiplied by $(-1)^{n-1}$ times the determinant of an upper triangular matrix of size $n-1$ with all diagonal entries $-1$, which gives $a_0$; the starting case, the matrix of this type for the polynomial $a+x$, is a $1\times1$ matrix with $x+a$ as coefficient. Again the polynomial is found as in a Horner scheme.
Yet another way is to write the determinant as
$$
x^n+\left|\matrix{x&0&0&\ldots&a_0\\
-1&x&0&\ldots&a_1\\
0&-1&x&\ldots&a_2\\
\vdots&\ddots&\ddots&\ddots&\vdots\\
0 & \cdots & 0 & -1 & a_{n-1}}\right|
$$
and develop by the last column, observing that the cofactor by which the entry $a_k$ is multiplied is $(-1)^{n-1-k}$ times a minor that has a block decomposition
$M=\left|{L\atop0}~{0\atop{U}}\right|$ where $L$ is a lower triangular matrix of size $k$ with entries $x$ on the diagonal, and $U$ is an upper triangular matrix of size $n-1-k$ with entries $-1$ on the diagonal, making the cofactor $x^k$, and the characteristic polynomial $f$.
Now the elementary proof of the Cayley-Hamilton theorem. Proceed by induction on $n$, the case $n=0$ being trivial.
For $n>0$ take a nonzero vector $v$, and let $V$ be the subspace generated by its repeated images under the linear transformation $\phi$, which has a basis $v,\phi(v),\ldots,\phi^{d-1}(v)$ where $d=\dim(V)>0$ is the degree of the minimal polynomial $P$ that annihilates $v$ when acting by $\phi$.
Extend to a basis of the whole space, in which basis $\phi$ has a matrix of the form $M=\left({A\atop0}~{{*}\atop{B}}\right)$, where $A$ is the companion matrix of $P$.
One has $\chi_M=\chi_A\chi_B$, where $\chi_A=P$, by the computation above. Now one gets zero matrices when evaluating $P$ in $A$ (because $P$ is its minimal polynomial) and (by induction) when evaluating $\chi_B$ in $B$. Thus evaluating $\chi_M=P.\chi_B$ in $M$ gives a matrix product that in block form is $\left({0\atop0}~{{*}\atop{*}}\right)\cdot\left({{*}\atop0}~{{*}\atop0}\right) =\left({0\atop0}~{0\atop0}\right)$. Note that one cannot use the induction hypothesis for $A$: one might have $d=n$, in which case $A$ is no smaller than the case currently being proved (in fact this will be the case for "generic" choices of $M$ and $v$). Therefore treating the companion matrix case explicitly is really necessary in this line of reasoning.