1

Suppose a function $f: D \subseteq \Bbb R^m \rightarrow \mathbb R$ is defined on an open subset $D \subseteq \Bbb R^m$. Then, we define that : $f$ is differentiable at $p$ if there exists a linear function $L: \Bbb R^m \rightarrow \Bbb R$ and a function $\eta: D_p \subseteq \Bbb R^m \rightarrow \Bbb R$ such that : $$f(p+h)-f(p)=L(h)+ ||h|| \eta (h)$$ where $h \in D_p, ~~\lim_{||h|| \rightarrow 0 }\eta(h)=0.$ $D_p$ is defined as $=\{h \in \Bbb R^m:p+h \in D\}$

I wanted to understand the motivation behind this definition. Here's what I could think:

$\lim_{h \rightarrow 0} \dfrac{f(p+h)-f(p)}{||h||}$ represents the derivative at $p$

$\dfrac{f(p+h)-f(p)}{||h||}$ can be approximated as $f'(p)+\eta(h)$ where $\eta(h)$ represents an error function.

Thus, $f(p+h)-f(p)= ||h||f'(p) + ||h|| \eta(h)$.

Now, our definition calls $||h|| f'(p)$ as a linear function $L(h)$.

If we can prove that $||h|| f'(p)$ is linear, then this argument is a sufficient motivation for this definition of differentiability of function $f$.

But: for $\lambda \in \Bbb R: ||h_1+\lambda h_2|| f'(p) \ne ||h_1||f'(p) + |\lambda|~||h_2||f'(p)$ and hence not linear !!

If it would have been, there would have been a perfect motivation for the definition. Now, in such a case, why do we call $L(h)$ a linear function in the original definition?

Could someone tell me how did this definition come about. What is the motivation behind this definition? Thanks!

L F
  • 3,644
MathMan
  • 8,974
  • 7
  • 70
  • 135
  • 1
    $\lVert h\rVert f'(p)$ is not a linear function. The linear function is the mapping $h\mapsto f'(p) \cdot h$, and it is this which approximates $f(p+h) - f(p)$. See this answer for the heuristics. – peek-a-boo Aug 24 '20 at 21:00
  • @peek-a-boo yes. If it would have been, there would have been a perfect motivation for the definition. Now, in such a case, why do we call $L(h)$ a linear function in the original definition? – MathMan Aug 24 '20 at 21:02
  • You seem to be misunderstanding what $L(h)$ means. (I'm not sure, but take a look at that answer and if you still have questions then please clarify). In this setup, $L(h) = f'(p)\cdot h$...so $L(\lambda h_1 + h_2) - f'(p) \cdot (\lambda h_1 + h_2) = \dots = \lambda L(h_1) + L(h_2)$. Often the linear function $L$ is denoted as $Df_p$, or $Df(p)$, or $df_p$ or $df(p)$ (etc depending on the author). So, $h\mapsto L(h)$ is the linear function, while $f'(p)$ is the matrix representation of $L$ with respect to the standard bases. – peek-a-boo Aug 24 '20 at 21:02
  • @peek-a-boo could you please explain – MathMan Aug 24 '20 at 21:03
  • 1
    @MathMan, The linear map in question acts on the tangent space of the surface (the function's graph) at that specific point. The notation you are using looks like "baby Rudin", which is an awfully hard book to learn from.:) Try looking, instead, into Chapter 1 (specifically, fig. 52) in Arnol'd's "Ordinary Differential Equations". If that doesn't help, try V. Zorich's "Mathematical Analysis", vol. 1. – avs Aug 24 '20 at 21:09
  • @peek-a-boo But, if we were to go by thy basic definition of a derivative: it should be $ f'(p).||h|| $ – MathMan Aug 24 '20 at 21:27
  • Try to rewrite things as $f'(p) \cdot h = \lim_{t\to 0} \frac{f(p+th)-f(p)}{t}$. This is actually how a differential is defined. What you wrote as $f'$ is the directional derivative in the direction of $h$. – ECL Aug 24 '20 at 21:40

4 Answers4

2

You said:

  1. $\lim_{h \rightarrow 0} \dfrac{f(p+h)-f(p)}{||h||}$ represents the derivative at $p$

No, it doesn't. Next, you write:

  1. $\dfrac{f(p+h)-f(p)}{||h||}$ can be approximated as $f'(p)+\eta(h)$ where $\eta(h)$ represents an error function.

Again, this is false.


The definition of differentiability is that there exist a linear function $L:\Bbb{R}^m\to \Bbb{R}$ such that \begin{align} f(p+h) - f(p) &= L(h) + \lVert h\rVert\eta(h) \tag{$\ddot{\smile}$} \end{align} where $\lim_{h\to 0} \eta(h) = 0$. Typically, the notation used is that $L := Df_p$. Also, if we define $\Delta f_p(h) := f(p+h) - f(p)$, then the above equation becomes very memorable: \begin{align} \Delta f_p(h) &= Df_p(h) + \lVert h\rVert \eta(h). \end{align} This is exactly the formal way of saying that differentiable functions are locally approximately linear, because it says the actual change in the function (at the point $p$ by an amount $h$) $\Delta f_p(h)$ is equal to a linear part $Df_p(h)$ plus an error term $\lVert h\rVert\eta(h)$, and this error term is "small" in the sense $\eta(h)\to 0$ as $h\to 0$.

So, to address (1) above, it is $L= Df_p$ which is the derivative at $p$ (by definition). For (2), we have \begin{align} \dfrac{f(p+h) - f(p)}{\lVert h \rVert} &= \dfrac{Df_p(h)}{\lVert h\rVert} + \eta(h) \\ &= Df_p\left(\dfrac{h}{\lVert h \rVert}\right) + \eta(h) \end{align} so, you can interpret this however you want. But the point remains: $L(\cdot) = Df_p(\cdot)$ is by definition a linear transformation which approximates changes in $f$ (i.e which is approximately equal to $\Delta f_p(\cdot)$).


In the comments you ask:

But, if we were to go by the basic definition of a derivative: it should be $f′(p)\cdot \lVert h\rVert$.

What do you mean by basic definition? The definition you wrote in the question is the definition of derivative in multivariable calculus. Do you mean the definition of single-variable calculus? If that's what you meant then the thing is you need to learn to re-interpret the definition in single variable calculus. We are often taught in the case of $f:\Bbb{R}\to \Bbb{R}$ to think of $f'(p)$ geometrically as "the instantaneous slope at $p$", because we define (if the limit exists) \begin{align} f'(p):= \lim_{h\to 0}\dfrac{f(p+h) - f(p)}{h} \end{align} so of course, geometrically this forces us to think in terms of slopes.

What I'm now suggesting to you is to think in terms of "local linear approximations (this allows for a much easier transition to multivariable calculus) and to rewrite this definition as \begin{align} \lim_{h\to 0}\dfrac{f(p+h) - f(p) - f'(p)\cdot h}{h} &= 0 \tag{$*$} \end{align} In this case, the mapping $\Bbb{R}\to \Bbb{R}$, $h\mapsto f'(p)\cdot h$ is a linear transformation which approximates the actual change $\Delta f_p(h):= f(p+h) - f(p)$.

Note that in this case, we are able to divide by $h$ because it is a real number and not a vector. But notice that $(*)$ is entirely equivalent to $(**)$: \begin{align} \lim_{h\to 0} \dfrac{|f(p+h) - f(p) - f'(p)\cdot h|}{|h|} &= 0 \tag{$**$} \end{align} and in this form, the relation with definition of differentiability in higher dimensions is much more clear, because $(\ddot{\smile})$ is entirely equivalent to the following statement (with appropriate domains and target spaces):

There exists a linear transformation $L$ such that \begin{align} \lim_{h\to 0}\dfrac{\lVert f(p+h) - f(p) - L(h)\rVert}{\lVert h\rVert} &= 0. \end{align}

peek-a-boo
  • 55,725
  • 2
  • 45
  • 89
  • Thank you very much! – MathMan Aug 25 '20 at 10:58
  • Okay. So i get that $f:\Bbb R^m \rightarrow \Bbb R$ is differentiable if $\dfrac {|| f(p+h) - f(p) - (\alpha \cdot h ) ||}{||h||} \rightarrow 0$. I also get it that $\alpha \cdot h$ (i.e. the scalar product ) is linear. But what prompts us to say that $L(H)$ can be any linear function. All linear functions may not be of the form of the scalar product $\alpha \cdot h$. Thanks again! – MathMan Aug 25 '20 at 11:42
  • 1
    @MathMan when you say scalar product do you mean $\alpha\in \Bbb{R}$ and $h\in \Bbb{R}^m$? If yes, then that's completely incorrect. In general, for a linear $L:\Bbb{R}^m\to \Bbb{R}^n$, there is an $n\times m$ matrix (namely the matrix representation of $L$ with respect to standard ordered bases) such that $L(h) = [L]\cdot [h]$ (i.e matrix multiplication). Finally, if by "scalar" product you meant "inner product/dot product", then my advice to you is to not even think in terms of inner products because that is extra structure which is not needed for differentiation. – peek-a-boo Aug 25 '20 at 14:15
  • 1
    But if you really want to know, then in the specific case where $L:\Bbb{R}^m\to \Bbb{R}$ (i.e the target space is $\Bbb{R}$ and the domain is a Hilbert space), it means $L\in (\Bbb{R}^m)^$ is an element of the dual space, so by Riesz lemma the mapping $\alpha\mapsto \langle \alpha, \cdot \rangle$ from $\Bbb{R}^m\to(\Bbb{R}^m)^$ is an isomorphism. Hence, there is a unique vector $\alpha\in \Bbb{R}^m$ such that for all $h \in \Bbb{R}^m$, $L(h) = \langle \alpha, h\rangle$. Usually when people speak of "the gradient vector of $f$ at $p$", this is what they mean. – peek-a-boo Aug 25 '20 at 14:18
1

You're main claim is that the derivative of $f$ in $p$ is given by $$f'(p) = \lim_{h\to 0} \frac{f(p+h)-f(p)}{\|h\|}\,.$$ Actually all your confusion comes from the fact that this definition is wrong.

To see why, just look at the easiest case, $D=\mathbb{R}$. Take $f(x) = x$ and apply your definition in $0$. You know that you must have $f'(0) = 1$, but with your definition I can take $$f'(0) = \lim_{h\to 0} \frac{x+h-x}{\|h\|} = \lim_{h\to 0^-} \frac{h}{\|h\|} = \lim_{h\to 0^-}\frac{h}{-h} = -1\,,$$ which is wrong.

The differential in $p$ is a linear operator $L_p$ as in your definition, such that $$L_p(h) = \lim_{t\to 0} \frac{f(p+th)-f(p)}{t}\,.$$

ECL
  • 2,960
0

Extremely grateful to @peek-a-boo and @ECL for their valuable answers.

Here's an answer that is the mixture of both these answers and precisely states the motivation for the differentiability for a function in higher dimension domains:

  • When $f: \Bbb R \rightarrow \Bbb R, x \in \Bbb R$, we define ( provided the limit exists) : $$f'(x)=\lim_{h \rightarrow0} \dfrac{f(x+h)-f(x)}{h}$$

  • When $f:\Bbb R^n \rightarrow \Bbb R, X = (x_1,\cdots,x_n) \in \Bbb R^n$, the above definition of the diffferentiability of function of one variable cannot be generalized as we cannot divide by an element of $\Bbb R^3.$

  • So, in order to provide a definition, we re-arrange the first definition in the following manner:

Let $f:\Bbb R \rightarrow \Bbb R$. Then, $f$ is differentiable an $x \iff \exists \alpha \in \Bbb R :\dfrac {|f(x+h)-f(x)-\alpha\cdot h|}{|h|} \rightarrow 0$. We generalize this definition to the function of several variables :

Let $f:\Bbb R^n \rightarrow \Bbb R, X = (x_1,\cdots,x_n)$. We say that $f$ is differentiable at $X$ if there exists $\alpha =(\alpha_1, \cdots, \alpha_n) \in \Bbb R^n$ s.t the error function :

$$E(H)=\dfrac{f(X+H)-f(X)-\langle\alpha,H \rangle}{||H||}$$

tends to $0$ as $H$ tends to $0$.

Note that : $\langle\alpha,H \rangle$ is the standard scalar product and is a linear mapping specified as $L(H)= \langle\alpha,H \rangle = \alpha_1 h_1 + \cdots \alpha_n h_n$

Exercise: Taking into consideration the introduction of scalar product into our definition : All linear mappings $L(H)$ must be necessarily of the form $\langle\alpha,H \rangle$

This quantity $\alpha \in \Bbb R^n $ is defined as the derivative of the function $f$ at $X$ and the linear differential $L(H)$ is defined as $\langle\alpha,H \rangle$

MathMan
  • 8,974
  • 7
  • 70
  • 135
-1

The motivation is this:

You want to know how $f(x$) changes with few movements of $x$ arround some open set $V$ with $x$ inside it. The task can be reached with a estimation of $f(x) \equiv T(x)$ where $T(x)$ is a linear function.

If this is true, then you can say: "oh well, I wanted to know what happened with $x$ moving inside the open set $V$, I can do this by $x+tu$, where $t$ is a tiny number and $v$ a small direction vector, so If I examine $$T(x+tu)=T(x)+tT(u).$$ My goal is to find the best $T(x)$ for $f(x)$, so I look for $T(x)$ from last equation: $$T(x) = T(x+tu) - tT(u)$$ If $t$ is small as possible, then I'm taking the best linear function approximation of $f$ trough $x$ point.

You can identify some elements with respect the derivative definiton, such as the factor $t$ which is equivalent to your $||h||$, both must to be small enought for the good aproximation.

L F
  • 3,644
  • for the downvoters, it would be awesome if they could make its own anwer and also leave a comment for improving my answer. Later when I fix their recommendation (because athe only thing that you do is downvoting and give 0 recomendations) I hope that you deletes your downvote. – L F Aug 24 '20 at 21:19
  • thanks for the answer. Although,I didn't down vote your answer, but my question basically was: how did we arrive at a linear function from the basic definition of a derivative – MathMan Aug 24 '20 at 21:23
  • 1
    As I know, its because we know how simple a linear function is. We want to explain complicated things by basics we know, the most basic and with sense function is the linear. – L F Aug 24 '20 at 21:25