Linear approximation definition of differentiability

Question

A function $f:\Bbb R^n \rightarrow\Bbb R$ is differentiable at $a$ iff there exists a linear map $L$ and a function $g$ tending to $0$ as its argument tends to $0$ such that:

$$f(a + h) - f(a) = L(h) + g(h)||h||$$

My question is about the purpose of the $||h||$ in that formula.

The gist of the idea is that a differentiable function is one that is locally approximated by a linear map, so initially I wondered why we needed the $||h||$, instead of just:

$$f(a + h) - f(a) = L(h) + g(h)$$

I think I can see why this doesn't work: just set $g(h) = f(a + h) - f(a) - L(h)$, and $L$ is linear so the whole thing tends to $0$ as $h$ tends to $0$. This works for any $L$, so we have that every function is differentiable everywhere and is "approximated" by any linear map you like: not very useful. At least in the one-dimensional case, I can see that the tangent line is the "best" linear approximation, and this question helped me understand why. So clearly the multiplication by $||h||$ does something to eliminate all but the "best" approximation. What I don't understand is:

How to prove that indeed, the formula given at the top of this post does guarantee you get the "best" $L$.
Really, what a proof of that would even look like - I'm still not completely sure what it means to be the "best" approximation (the linked question characterizes "bestness" by comparing to other approximations, but is there an absolute way of characterizing "bestness"?)
This question might be answered at the same time as (1), but why multiplying by $||h||$ does this.

More typical is the use of the little-o notation, the above $g(h) |h|$ would be written $o(h)$, where $o(h)$ stands for some function that satisfies $\frac{o(h)}{|h|} \to 0$. — copper.hat, Jul 02 '13 at 15:32
The 'best' part is more a definition; $f$ is differentiable at $a$ with derivative $A$ iff for all $\epsilon>0$, there exists $\delta>0$ such that if $|h|< \delta$, then $|f(a+h)-f(a)-Lh| \le \epsilon |h|$. — copper.hat, Jul 02 '13 at 15:34
One purpose served by $||h||$ is that it guarantees uniqueness of $L$, as you can prove directly. So, there's something special about $L$! — Lee Mosher, Jul 02 '13 at 15:47
@copper.hat In that case, why $o(h)$ and not $o(h^2)$ or something? — Jack M, Jul 03 '13 at 17:15
@JackM: Because you are looking for first order terms, requiring the remainder to by $o(h^2)$ would be too stringent (basically forces the second derivative to be zero at that point). The Taylor series serves well from an intuition standpoint. — copper.hat, Jul 03 '13 at 17:20

score 1 · Answer 1 · answered Jul 02 '13 at 16:01

Let's first step back to a simpler example: what is the best constant to approximate $f(a+h)$ when $h \approx 0$. We want to split $f(a+h) = c + D(h)$, where $D(h)$ has no constant part.

Thinking of the continuous functions it as a vector space this is kind of breaking the function of h $f(a+h)$ into orthogonal components, the constant component $c$ and the orthogonal component $D(h)$. That is $D(0) = 0$.

Thus $f(a) = c$, and $f(a+h) = f(a) + D(h)$.

Your question regards the next step; breaking $D$ into a linear component, $L(h)$ and an orthogonal component $g(h)$. That is $$f(a+h) - f(a) = L(h) + g(h)$$

Equivalently $f(a+h) - f(a) = L(\hat{h})||h|| + g(h)$, and so setting $\tilde{g}(h) = {g(h) \over ||h||}$ we get $${f(a+h) - f(a) \over ||h||} = L(\hat{h}) + \tilde{g}(h)$$

This is the same problem as before: for each unit vector $\hat{h}$ we want to find the best constant $L(\hat{h})$, so we take $\tilde{g}(0)$ as $0$.

That is $$f(a+h) - f(a) = L(h) + \tilde{g}(h)||h||$$ where $\tilde{g}(h)$ tends to 0 as h tends to 0.

So essentially it is the "best" L because g has no linear part; we multiply by $||h||$ because linear functions scale linearly $L(h)=L(\hat{h}||h||)$.

When I say $D(h)$ has "no constant part" I mean precisely $D(0) = 0$. Another way of looking at this is we choose $c$ above to minimise the function $||f(a+h)-c||$ at $h = 0$, and similarly choose the linear function $L$ to minimise ${||f(a+h) - f(a) - L(h)|| \over ||h||}$ (as copper.hat said in the comments above). — Edward Ross, Jul 09 '13 at 06:58

Linear approximation definition of differentiability

1 Answers1