I was looking for a straightforward answer to the question and most proofs and theorems I saw got too abstract or ended up using rational canonical forms or Jordan blocks and all. But can we not prove the statement:
The minimal and characteristic polynomial of a matrix coincide iff the set $\{I,A,A^2,...,A^{n-1}\}$ are linearly independent.
using just Caylely-Hamilton?
Here is an attempt:
Let $A : \mathbb{R}^n \to \mathbb{R}^n$ be a linear operator and let $p(A) = \sum \limits_{i=0}^{n} c_i A^i$ be its characteristic polynomial. By Cayley-Hamilton we know $p(A) =0$. Assume the set $\{I, A^2, A^3, \dots ,A^{n-1}\}$ is linearly dependent. Then by definition $\sum \limits_{i=0}^{n-1} \alpha_i A^i =0$ where not all $\alpha_i $ are zero. Without loss of generality re-write the expression as $A^k = \sum \limits_{i=0}^{k-1} \beta _i A^i$ where $k < n$ is the highest power of $A$ with a nonzero $\alpha_i$. Then every term in $p(A) =0$ of order $j\geqslant k$ can be replaced with a polynomial of order $j-1$ since $${{A}^{k+1}}=A{{A}^{k}}=A\sum\limits_{i=0}^{k-1}{{{\beta }_{i}}{{A}^{i}}}=\sum\limits_{i=0}^{k-1}{{{\beta }_{i}}{{A}^{i+1}}=\sum\limits_{i=1}^{k}{{{\beta }_{i-1}}{{A}^{i}}}}$$ In particular it applies to the highest order term. Therefore there exists $q(A) = 0$ such that $\deg (q) < \deg(p) $ and therefore $p(A)$ cannot be the minimal polynomial.
That completes one direction. We showed not linearly independent implies not the same polynomial.
In the other direction, assume that $p(A)\ne q(A)$. Then necessarily, $\deg (q) < n = \deg (p)$. Then by the definition of the minimal polynomial $q\left( A \right) = \sum\limits_{j = 0}^k {{c_j}} {A^k} = 0$ for some $k<n$. But this means that there exists a non-trivial linear combination of $A,..,A^k$ that sums to zero and therefore $\{I,A,A^2,...,A^{n-1}\}$ cannot be linearly independent.
Is the proof correct? My second direction felt like I just re-wrote the first?