This is incredibly easy to prove if you have the following result:
If a function $f$ is differentiable at $a$ then there exists a
continuous function $\varphi$ defined on an interval
$[-\epsilon,\epsilon]$ such that $\varphi(0)=0$ and
$$ f(a+h) = f(a) + f'(a)h + \varphi(h)h, $$
for all $h \in (-\epsilon,\epsilon)$.
And if such a continuous $\varphi$ exists such that
$$ f(a+h) = b + \alpha h + \varphi(h)h, $$
for all $h \in (-\epsilon,\epsilon)$, then $f$ is differentiable in
$a$ with $f'(a) = \alpha$.
The chain rule follows by direct computation: $(g \circ f)(a+h) = g(f(a+h))$, use that $f$ is differentiable to write $f(a+h)$ as $f(a) + f'(a)h + \varphi_f(h)h$, and then call "$f'(a)h + \varphi_f(h)h$" for $k$ and use that $g$ is differentiable.
There's a little bit of bookkeeping needed to make sure that there do exist appropriate intervals around $0$ for the auxillary continuous functions, but it's not too bad.
The best part about this proof is that it immediately generalizes to functions from $\mathbb R^m$ to $\mathbb R^n$.