13

For instance, I've just checked that that if you take the best linear approximation (in the $L^2$ sense) to a sufficiently nice function $f$ on the interval $[-\varepsilon, \varepsilon]$, and then let $\varepsilon \to 0$, you get $f(0) + x f'(0)$.

Surely we could make this stronger -- I imagine the analogous statements should hold for, say, the $L^1$ norm as well, or for most reasonable norms. Can we go farther, though?

Question: What is the strongest precise definition we can give the word "best" so that we have a statement of the form "the tangent line is the best linear approximation to a differentiable function"? (Feel free to replace "differentiable" with, say, $C^2$ or something if it makes for a more interesting answer.)

(Note: I'm aware of similar-sounding questions here, such as In what sense is the derivative the "best" linear approximation?, but the answers there don't answer my question.)

  • $\mathcal C^2$ is stronger than differentiable. – GPerez May 11 '15 at 18:39
  • @gperez By "weaken" I meant "state a weaker result," e.g. one that's only valid for $C^2$ functions and not arbitrary differentiable functions. But maybe that's not in line with common usage. I've edited to remove any possible ambiguity. – Daniel McLaury May 11 '15 at 18:40
  • @mattbiesecker: Yes, but that doesn't a priori rule out that your best approximations in various norms could converge to the same thing as $\varepsilon \to 0$. Or do you have an example where you get something other than the tangent line in this scenario? – Daniel McLaury May 11 '15 at 18:41
  • Forget common usage, logic has an answer. So you were right, I just didn't see that you were referring to the result. So Differentiable is a weaker hypothesis than $\mathcal C^2$, but a result on Differentiable functions is stronger than one on $\mathcal C^2$ functions. Nice duality. – GPerez May 11 '15 at 18:49
  • @DanielMcLaury There are weighted norms where it fails but I don't know of a counterexample for the $L^p$ norms. I may very well be true (it seems to be true for any polynomial $f(x)$) – matt biesecker May 11 '15 at 19:00
  • How do you define "best"? Anything close to "best fitting curve" technique? E.g. http://web.iitd.ac.in/~pmvs/courses/mel705/curvefitting.pdf – rtybase May 12 '15 at 22:34
  • @rtybase: The question is about how we can define "best." That said, I'm not sure I see how you can make a meaningful statement here involving linear regression that's not just strictly weaker than the $L^2$ thing I mentioned above. – Daniel McLaury May 13 '15 at 18:21
  • @mattbiesecker: Can you give an example of a weighted norm where it fails? – Daniel McLaury May 13 '15 at 18:23
  • @DanielMcLaury. I retract my earlier conjecture about weighted norms. I check enough examples (oddball weighted norms, sobolev norms) to believe it is plausible your $L_2$ result my indeed be= true for any norm (at least for $f \in C^1$) – matt biesecker May 13 '15 at 23:16

1 Answers1

2

If $f$ is differentiable at $0$ the the same statement holds with uniform approximation instead of $L_2$.

Proof: Let $T(x) = f(0) + xf'(0)$. By the definition of differentiability we have $$ f(x) - T(x) = \mathcal o(|x|) $$ and thus $$ \sup_{|x|\le\varepsilon} |f(x) - T(x)| = \mathcal o(\varepsilon).$$ So this would also be satisfied by the best approximation on interval $[-\varepsilon, \varepsilon]$ say $g_\varepsilon$. In particular, for any mapping $\varepsilon\mapsto x_\varepsilon \in [-\varepsilon, \varepsilon]$ we have $$ |f(x_\varepsilon) - g_\varepsilon(x_\varepsilon)| \le \sup_{|x|\le \varepsilon} |f(x) - g_\varepsilon(x)| \le \sup_{|x|\le \varepsilon} |f(x) - T(x)| = \mathcal o(\varepsilon).$$ Hence, we have $$g_\varepsilon(0) = f(0) + \mathcal o(\varepsilon)$$ and $$ g_\varepsilon' = \frac{g_\varepsilon(\varepsilon) - g_\varepsilon(-\varepsilon)}{2\varepsilon} = \frac{f(\varepsilon) - f(-\varepsilon)}{2\varepsilon} + \mathcal o(1) \to f'(0).$$

For $L_p$ average, $1\le p < \infty$:

Fix some $p\in [1,\infty)$. For any measurable $\phi$ denote its average $L_p$ norm by $$N_\varepsilon \phi = \sqrt[p]{\frac{1}{2\varepsilon} \int_{-\varepsilon}^{\varepsilon} |\phi(x)|^p\,dx}.$$ Let $g_\varepsilon$ be a $L_p$ best approximation on $[-\varepsilon, \varepsilon]$. Then, we also have $$ N_\varepsilon (f - g_\varepsilon) \le N_\varepsilon (f - T) = \mathcal o(\varepsilon). $$ and $$ N_\varepsilon (g_\varepsilon - T) \le N_\varepsilon (f - g_\varepsilon) + N_\varepsilon (f - T) = \mathcal o(\varepsilon). $$

user251257
  • 9,229