Let $f:\mathbb{R}^n \to \mathbb{R}^m$ be a differentiable function. I understand how for $n=m=1$, the derivative is the "best" (although its irritating how few textbooks define this term) linear approximation after reading this answer by DLeMeur. I want to try to extend this method into several variable functions.
So lets try to approximate $f$ at the point $a$. We have the usual linear (I suppose affine is the technically correct term) approximation from the derivative, call it $$L_1(x) = Df(a)(x-a) + f(a)$$ where $Df(a)$ is the Jacobian matrix at the point $a$. Now suppose $L_2(x) = C(a)(x-a) + f(a)$ for some matrix $C(a)$ such that $Df(a) \neq C(a)$.
In order to adapt the method shown in the linked answer I need to show that $$\lim_{x\to a} \frac{||f(x) - L_1(x)||}{||x-a||}=0 \hspace{1.5cm}(1)$$ and
$$\lim_{x\to a} \frac{||f(x) - L_2(x)||}{||x-a||} \neq 0 \hspace{1.5cm}(2)$$ and exists as a finite number. Then I can conclude by taking the quotient and applying the standard limit theorems that $$\lim_{x\to a} \frac{||f(x)-L_1(x)||}{||f(x) - L_2(x)||} = 0 \hspace{1.5cm}(3)$$ In other words, the "error" between $f$ and $L_1$ decreases to zero faster than the error for any other reasonable linear approximation at $a$.
It is easy to show $(1)$ as this is just the definition of the derivative at $a$. However, I am having trouble with $(2)$.
The issue is the norms prevent me from using the strategy in the linked answer. Is there a better way to go about this? I really want result $(3)$ as it is the only one the makes sense to me regarding the idea of derivatives providing the best "linear approximation".
Thanks