Following the comments of Didier and studying his answer I was able to prove it coordinate free with the trace definition. Because I like doing Riemannian geometry coordinate free and find the trace definition of the divergence much more natural I will also give this proof. We start by restating the definition.
Definition Let $X$ be a vectorfield on $M$ and $A_1,...,A_n$ a basis of $TM$ with duals $A^1,...,A^n$ such that $A^i(A_j) = \delta^i_j$. Then the covariant divergence of $X$ is
$$
\text{div}(X) = A^i(\nabla_{A_i}X).
$$
Next we will prove an auxiliary claim that is used in the computation of the proof.
Claim If $E_1,...,E_n$ is an orthonormal basis (where the duals $E^1,...,E^n$ are $E^i = \langle E_i, \cdot\rangle)$ then
$$
\mathcal{L}_X(E^i) = \langle \nabla_XE_i,\cdot\rangle + E^i(\nabla_\cdot X).
$$
Proof. Let $Y$ be any vectorfield. By the Leibnitz rule we have that
\begin{align}
\mathcal{L}_X(E^i)(Y) &= \mathcal{L}_X(E^i(Y)) - E^i(\mathcal{L}_XY)\\
&=X(E^i(Y)) - E^i([X,Y])\\
&=X(E^i(Y)) - E^i(\nabla_XY-\nabla_YX)\\
&=X\langle E_i,Y \rangle - \langle E_i,\nabla_XY\rangle + \langle E_i,\nabla_YX\rangle\\
&=\langle \nabla_XE_i,Y\rangle + \langle E_i,\nabla_XY\rangle -\langle E_i,\nabla_XY\rangle + \langle E_i,\nabla_YX\rangle\\
&=\langle \nabla_XE_i,Y\rangle +\langle E_i,\nabla_YX\rangle.
\end{align}
Thus
$$
\mathcal{L}_X(E^i) = \langle \nabla_XE_i,\cdot\rangle +\langle E_i,\nabla_\cdot X\rangle
$$
which was to show.$\Box$
Proposition Let $\text{vol}$ be the volume form on $M$. Then
$$
\mathcal{L}_X(\text{vol})=\text{div}(X)\text{vol}
$$
Proof. We choose an orthonormal basis $E_1,...,E_n$. The duals $E^1,...,E^n$ are $E^i = \langle E_i, \cdot\rangle$ and the volume form is $\text{vol} = E^1 \wedge ... \wedge E^n$. Now we compute
\begin{align}
\mathcal{L}_X(\text{vol}) &= \mathcal{L}_X(E^1 \wedge ... \wedge E^n)\\
&= \sum_{i=1}^{n}E^1 \wedge ...\wedge \mathcal{L}_X(E^i)\wedge... \wedge E^n\\
&= \sum_{i=1}^{n}E^1 \wedge ...\wedge (\langle\nabla_XE_i,\cdot\rangle + E^i(\nabla_\cdot X))\wedge... \wedge E^n\\
\end{align}
where we used the claim. Note that applying $X$ to $1 = \langle E_i,E_i\rangle$ yields
$$
0 = X\langle E_i,E_i\rangle = 2\langle\nabla_XE_i,E_i\rangle
$$
and therefore $\langle\nabla_XE_i,E_i\rangle = 0$. This means that when we write this 1-form in the basis $\langle\nabla_XE_i,\cdot\rangle = \alpha_j E^j$ the i-th coefficient is zero and thus in the above wedge product we have a term $E^k \wedge E^k$ for each non-vanishing coefficient $\alpha_k \neq 0$ and therefore this 1-form does not contribute to the wedge product and so
$$
\mathcal{L}_X(\text{vol}) = \sum_{i=1}^{n}E^1 \wedge ...\wedge E^i(\nabla_\cdot X)\wedge... \wedge E^n.
$$
The form at hand is of top dimension. Hence there exists a coefficient $\alpha \in C^\infty(M)$ such that
$$
\sum_{i=1}^{n}E^1 \wedge ...\wedge E^i(\nabla_\cdot X)\wedge... \wedge E^n = \alpha E^1 \wedge ... \wedge E^n.
$$
We will determine this coefficient by evaluating the right and left hand sides on $E_1,...,E_n$. For the RHS we get
$$
\alpha E^1 \wedge ... \wedge E^n(E_1,...,E_n) = \alpha.
$$
The LHS evaluates to
\begin{align}
\text{LHS} & = \sum_{i=1}^n\det\left(\begin{array}{ccc}E^1(E_1) & ... & E^n(E_1)\\
...& E^i(\nabla_{E_i}X)& ...\\
E^1(E_n) & ... & E^{n}(E_n) \\\end{array}\right)\\
&=\sum_{i=1}^{n}E^i(\nabla_{E_i}X)\\
&=\text{div}(X).
\end{align}
Hence $\alpha = \text{div}(X)$ and so we have shown that
\begin{align}
\mathcal{L}_X(\text{vol}) &= \sum_{i=1}^{n}E^1 \wedge ...\wedge E^i(\nabla_\cdot X)\wedge... \wedge E^n\\
& = \text{div}(X)E^1\wedge ... \wedge E^n\\
& = \text{div}(X)\text{vol}
\end{align}
which is the result.$\Box$