What is the strongest possible statement of the idea that "the tangent line is the best linear approximation"?

Question

For instance, I've just checked that that if you take the best linear approximation (in the $L^2$ sense) to a sufficiently nice function $f$ on the interval $[-\varepsilon, \varepsilon]$, and then let $\varepsilon \to 0$, you get $f(0) + x f'(0)$.

Surely we could make this stronger -- I imagine the analogous statements should hold for, say, the $L^1$ norm as well, or for most reasonable norms. Can we go farther, though?

Question: What is the strongest precise definition we can give the word "best" so that we have a statement of the form "the tangent line is the best linear approximation to a differentiable function"? (Feel free to replace "differentiable" with, say, $C^2$ or something if it makes for a more interesting answer.)

(Note: I'm aware of similar-sounding questions here, such as In what sense is the derivative the "best" linear approximation?, but the answers there don't answer my question.)

@gperez By "weaken" I meant "state a weaker result," e.g. one that's only valid for $C^2$ functions and not arbitrary differentiable functions. But maybe that's not in line with common usage. I've edited to remove any possible ambiguity. — Daniel McLaury, May 11 '15 at 18:40
@mattbiesecker: Yes, but that doesn't a priori rule out that your best approximations in various norms could converge to the same thing as $\varepsilon \to 0$. Or do you have an example where you get something other than the tangent line in this scenario? — Daniel McLaury, May 11 '15 at 18:41
Forget common usage, logic has an answer. So you were right, I just didn't see that you were referring to the result. So Differentiable is a weaker hypothesis than $\mathcal C^2$, but a result on Differentiable functions is stronger than one on $\mathcal C^2$ functions. Nice duality. — GPerez, May 11 '15 at 18:49
@DanielMcLaury There are weighted norms where it fails but I don't know of a counterexample for the $L^p$ norms. I may very well be true (it seems to be true for any polynomial $f(x)$) — matt biesecker, May 11 '15 at 19:00
How do you define "best"? Anything close to "best fitting curve" technique? E.g. http://web.iitd.ac.in/~pmvs/courses/mel705/curvefitting.pdf — rtybase, May 12 '15 at 22:34
@rtybase: The question is about how we can define "best." That said, I'm not sure I see how you can make a meaningful statement here involving linear regression that's not just strictly weaker than the $L^2$ thing I mentioned above. — Daniel McLaury, May 13 '15 at 18:21
@mattbiesecker: Can you give an example of a weighted norm where it fails? — Daniel McLaury, May 13 '15 at 18:23
@DanielMcLaury. I retract my earlier conjecture about weighted norms. I check enough examples (oddball weighted norms, sobolev norms) to believe it is plausible your $L_2$ result my indeed be= true for any norm (at least for $f \in C^1$) — matt biesecker, May 13 '15 at 23:16

user251257 · Answer 1 · 2016-03-25T20:30:21.007

If $f$ is differentiable at $0$ the the same statement holds with uniform approximation instead of $L_2$.

Proof: Let $T(x) = f(0) + xf'(0)$. By the definition of differentiability we have $$ f(x) - T(x) = \mathcal o(|x|) $$ and thus $$ \sup_{|x|\le\varepsilon} |f(x) - T(x)| = \mathcal o(\varepsilon).$$ So this would also be satisfied by the best approximation on interval $[-\varepsilon, \varepsilon]$ say $g_\varepsilon$. In particular, for any mapping $\varepsilon\mapsto x_\varepsilon \in [-\varepsilon, \varepsilon]$ we have $$ |f(x_\varepsilon) - g_\varepsilon(x_\varepsilon)| \le \sup_{|x|\le \varepsilon} |f(x) - g_\varepsilon(x)| \le \sup_{|x|\le \varepsilon} |f(x) - T(x)| = \mathcal o(\varepsilon).$$ Hence, we have $$g_\varepsilon(0) = f(0) + \mathcal o(\varepsilon)$$ and $$ g_\varepsilon' = \frac{g_\varepsilon(\varepsilon) - g_\varepsilon(-\varepsilon)}{2\varepsilon} = \frac{f(\varepsilon) - f(-\varepsilon)}{2\varepsilon} + \mathcal o(1) \to f'(0).$$

For $L_p$ average, $1\le p < \infty$:

Fix some $p\in [1,\infty)$. For any measurable $\phi$ denote its average $L_p$ norm by $$N_\varepsilon \phi = \sqrt[p]{\frac{1}{2\varepsilon} \int_{-\varepsilon}^{\varepsilon} |\phi(x)|^p\,dx}.$$ Let $g_\varepsilon$ be a $L_p$ best approximation on $[-\varepsilon, \varepsilon]$. Then, we also have $$ N_\varepsilon (f - g_\varepsilon) \le N_\varepsilon (f - T) = \mathcal o(\varepsilon). $$ and $$ N_\varepsilon (g_\varepsilon - T) \le N_\varepsilon (f - g_\varepsilon) + N_\varepsilon (f - T) = \mathcal o(\varepsilon). $$

What is the strongest possible statement of the idea that "the tangent line is the best linear approximation"?

1 Answers1