motivation of the need to use higher order infinitesimal when defining derivative

Question

I am trying to derive the concept of derivative and differential from limit and linear approximation for reviewing the subject. And I cant figure the motivation of using higher order infinitesimal as a requirement to define derivative during the middle of it.

Here is what I did

[Step 1]: I start by supposing the only thing I know is the concept of limit. And I should proceed to develop the notion of derivative from the idea of linear approximation

[Step 2]: Assume a single variable real function $f(x)$ as example, suppose $f(x)$ is defined on an interval $[(a-r), (a+r)]$ where $r$ is positive. What I want to do is to estimate the value of any $f(x)$ within this interval by using linear approximation: $$f(x)=A(x-a)+f(a)+E$$ where $E$ is the error of approximation $$E=[f(x)-f(a)]-A(x-a)$$

[Step 3]: As the constant $A$ can be chosen randomly, if I want to bring in the notion of derivative, I have to find some sort of motivation that requires me to find a specific $A$ which satisfies $$\lim\limits_{x\to a}\frac{E}{(x-a)}=0$$ that is to say the requirement for $A$ is that it has to make $E$ a higher order infinitesimal with respect to $(x-a)$.

Question:

[Q]: So what is actually the motivation behind such requirement?

[Q1.1]: I understand from the geometrical perspective it signifies the tangent line, but then what makes the tangent line so special that brings me the motivation to use it as my linear approximation constant $A$ ?

[Q1.2]: Moreover, what is the algebraical motivation behind such requirement without considering the geometrical interpretation ?

I have also wondered about this. For a function $f:\mathbb R^n \to \mathbb R^m$ to be differentiable at a point $a$ means that there exists an $m \times n$ matrix $A$ such that the residual $r(x)$ in the equation $f(x) = f(a) + f’(a)(x-a) + r(x)$ is “small” when $x$ is close to $a$. But what does “small” mean? How do I see directly that $\lim_{x \to a} r(x)/|x-a| = 0$ is the “right” definition of small. I recognize that this definition has the virtue of being equivalent to the definition of differentiable in single variable calc when $m = n = 1$. But still, I want to see it more directly. — littleO, Jul 27 '21 at 06:30

P'bD_KU7B2 · Answer 1 · 2021-07-30T09:42:11.890

I will try to answer this question myself (I don't know if this is the "right" answer, but I will just throw it here as its better than nothing)

What I want is to derive the concept of derivative & differential by only using the concept of limit and linear approximation. As I mentioned in my [Step 3], if I just want to approximate the value of $f(x)$ by the linear equation: $$f(x)=f(a)+A(x-a)+E=f(a)+A\Delta x+E\ \ \ \ \ \ (1)$$ then there is infinite choices of $A$ for me to pick.

So the question becomes: What kind of $A$ do I actually want ? or What kind of $A$ is "nice" enough ?

Now there are two goals of linear approximation (which i want $A$ to satisfy):

I want the error $E$ to be "small" enough so my calculation is accurate even if I choose to ignore $E$ from the equation.
For the sake of predictability and convenience, I want my approximation becomes more accurate when I perform a single operation (or do something), so I can tell someone or a computer how to improve the accuracy during calculation (or when is the approximation not accurate enough).

[Consider the first goal above], "small" with respect to what ? There are three terms on the right side of equation (1), because $f(a)$ is a constant so the only two terms affect the accuracy of my approximation is $A\Delta x$ and $E$, that is to say I want $E$ to be "small" with respect to $A\Delta x$. This means the value of fraction $$\frac{E}{A\Delta x}\ \text{is very small}.$$

[Consider the second goal above], I realize that $\frac{E}{A\Delta x}$ cannot always be very small. What I am looking for is an operation that will make it smaller (or larger) so I can tell someone or a computer what to do (or not to do) to improve the accuracy.

Now as the value of $E$ is dependent on $\Delta x$ for different choices of $x$, the only two options I have would be let $\Delta x \to 0$ or $\Delta x \to \infty$. Because any other options will likely involve letting $\Delta x$ be some kind of complicated function of $x$, and not only it defeats the pupose of linear approximation (for example, assume $\Delta x$ is a parabola function with respect to $x$, then why bother doing linear approximation at the first place ? I should just do a parabolic approximation !) but also this will likely involve more than one operation during approximation (which is not good if there are too many operations a person or a computer needs to carry).

So now I need to evaulate the two operations above:

It is obvious that I cannot guarantee $\frac{E}{A\Delta x}$ to keep becoming smaller when $\Delta x \to \infty$.
It seems possible for me to find a $A$ such that the value of $\frac{E}{A\Delta x}$ keeps decreasing when $\Delta x \to 0$. Which means I should probably look at the following limit $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$$

With the above two consideration, I should try to develop the requirement of $A$:

Assume $\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$ exists, what I want is $\frac{E}{A\Delta x}$ become smaller as $\Delta x \to 0$. This means at best I should expect this limit goes to zero (and the "nicest $A$" should at least satisfies the value of this limit), that is: $$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}=\frac{1}{A}\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$ Now I can say $E$ is a higher order inifitesimal of $\Delta x$. I can then express $E$ with the following equation $$E=\epsilon\Delta x\text{, where} \lim\limits_{\Delta x \to 0}\epsilon=0$$ Substitute $E=\epsilon\Delta x$ back to the equation (1) above, I have $$f(x)=f(a)+A\Delta x+\epsilon\Delta x$$ $$A+\epsilon=\frac{f(x)-f(a)}{\Delta x}$$ And it is not hard to see that $$\lim\limits_{\Delta x \to 0}(A+\epsilon)=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$\lim\limits_{\Delta x \to 0}A + \lim\limits_{\Delta x \to 0}\epsilon=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ $$A=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$ Now I can define $A$ to be the derivative, and $A\Delta x$, $\Delta x$ to be the differential. I can also claim for each $(x,f(x))$ in the interval, such $A$ is unique due to the uniqueness nature of limit $\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$.

(I can also claim now this unique $A$ is the "nicest $A$" because it is the only one that satisfies the least requirement of the "nicest $A$")

Thus I successfuly bring in the concept of derivative and differential by only using the concept of limit and linear approximation. And the function is said to be differentiable when the limit $$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0\ \ \text{exists}$$

score 2 · Accepted Answer · answered Jul 29 '21 at 01:05

This answer is heavily inspired by (for instance) this answer of Milo Brandt to How is the derivative truly, literally the “best linear approximation” near a point?, but viewpoint of the other question is different enough that this might not be a duplicate.

Basically, one motivation is that we want to be sure that we choose the best linear transformation, if one exists. What might it mean for one linear approximation to be (locally) better than another? Imagine two candidates $A_1$ and $A_2$, so that $f(x)=A_1*(x-a)+f(a)+E_1(x)=A_2*(x-a)+f(a)+E_2(x)$. Far from the point $a$, we would expect a lot of changes in whether $E_1$ or $E_2$ is larger (in absolute value). But we could say $A_1$ is "definitively at least as good" as $A_2$ if $|E_1(x)|\le|E_2(x)|$ on some (possibly tiny) interval around $a$, $(a-\varepsilon,a+\varepsilon)$.

Now, suppose that $A$ is "definitively at least as good" as any other linear approximation (different $\varepsilon$s would be needed for each candidate). It turns out that forces $\displaystyle{\lim_{x\to a}}\,\dfrac{E(x)}{x-a}=0$. (See If there is a linear function $g$ which is at least as good of an approximation as any other linear $h$, then $f$ is differentiable at $x_0$. for approaches to proving this.)

score 0 · Answer 3 · answered Jul 27 '21 at 05:09

The derivative comes straight from the formula for the slope of a secant line. If you take a point $(x,f(x))$ and a small distance $h$ away, giving a second point on the curve $(x+h,f(x+h)$), the slope of the line connecting them is our difference quotient $$\frac {f(x+h)-f(x)}{h}$$.

Now zoom in closer and closer to the given point $x$, which means taking $h$ smaller and smaller. If a function is sufficiently "nice", aka differentiable, it starts to look more and more like a straight line, so these closer and closer approximations become closer to the slope of the instantaneous tangent lines.

So by taking the limit as $h\to 0$, what we are doing is shrinking the width of our secant lines down to 0.

Since "nice" functions locally act like a line, we can approximate the function NEAR that point as if it were a line with that slope.

This would be the usual way to do it: Defining what the derivative is first and then using $\frac{\Delta y}{\Delta x}=f'(x)+\epsilon$ to see that $$\Delta y=f'(x)\Delta x+\epsilon\Delta x$$ and it is not hard to see the error of linear approximation $E=\epsilon\Delta x$ is a higher order infinitesimal of $\Delta x$. But what I want to do is actually in reverse: I want to do the linear approximation first before defining derivative, and then construct the concept of derivative, which is why I asked what is the motivation for its requirement. — P'bD_KU7B2, Jul 27 '21 at 05:56

motivation of the need to use higher order infinitesimal when defining derivative

3 Answers3