I found this answer that stated the following theorem -
Theorem: Let be a real valued function defined in a neighbourhood of point and continuous at and lets assume that it is approximated by a linear function given by ()=+ in the neighbourhood of . Then we say that is best linear approximation of in the neighbourhood of if the following equation holds :
$$ \lim_{x\to a} \frac{f(x)-g(x)}{x-a}=0$$
Such a linear approximation exists if and only if ′() exists and moreover in that case we have ()=()+′()(−).
This answer also uses this theorem to prove that the derivative is truly the best linear approximation. More like this is the 'sense' in which it is the best approximation.
After researching online I found that the idea seems to be that the derivative is the only linear approximation for which the approximation error tends to $0$ faster than $-$ as $→$, and based on this we call it the best approximation.
My question is, how does this actually prove that the derivative will beat any other linear approximation? How does it formally (if possible intuitively also) prove that the derivative is better than all the other approximations.