A function $f:\Bbb R^n \rightarrow\Bbb R$ is differentiable at $a$ iff there exists a linear map $L$ and a function $g$ tending to $0$ as its argument tends to $0$ such that:
$$f(a + h) - f(a) = L(h) + g(h)||h||$$
My question is about the purpose of the $||h||$ in that formula.
The gist of the idea is that a differentiable function is one that is locally approximated by a linear map, so initially I wondered why we needed the $||h||$, instead of just:
$$f(a + h) - f(a) = L(h) + g(h)$$
I think I can see why this doesn't work: just set $g(h) = f(a + h) - f(a) - L(h)$, and $L$ is linear so the whole thing tends to $0$ as $h$ tends to $0$. This works for any $L$, so we have that every function is differentiable everywhere and is "approximated" by any linear map you like: not very useful. At least in the one-dimensional case, I can see that the tangent line is the "best" linear approximation, and this question helped me understand why. So clearly the multiplication by $||h||$ does something to eliminate all but the "best" approximation. What I don't understand is:
- How to prove that indeed, the formula given at the top of this post does guarantee you get the "best" $L$.
- Really, what a proof of that would even look like - I'm still not completely sure what it means to be the "best" approximation (the linked question characterizes "bestness" by comparing to other approximations, but is there an absolute way of characterizing "bestness"?)
- This question might be answered at the same time as (1), but why multiplying by $||h||$ does this.