I will try to answer this question myself (I don't know if this is the "right" answer, but I will just throw it here as its better than nothing)
What I want is to derive the concept of derivative & differential by only using the concept of limit and linear approximation. As I mentioned in my [Step 3], if I just want to approximate the value of $f(x)$ by the linear equation:
$$f(x)=f(a)+A(x-a)+E=f(a)+A\Delta x+E\ \ \ \ \ \ (1)$$
then there is infinite choices of $A$ for me to pick.
So the question becomes: What kind of $A$ do I actually want ? or What kind of $A$ is "nice" enough ?
Now there are two goals of linear approximation (which i want $A$ to satisfy):
- I want the error $E$ to be "small" enough so my calculation is accurate even if I choose to ignore $E$ from the equation.
- For the sake of predictability and convenience, I want my approximation becomes more accurate when I perform a single operation (or do something), so I can tell someone or a computer how to improve the accuracy during calculation (or when is the approximation not accurate enough).
[Consider the first goal above], "small" with respect to what ? There are three terms on the right side of equation (1), because $f(a)$ is a constant so the only two terms affect the accuracy of my approximation is $A\Delta x$ and $E$, that is to say I want $E$ to be "small" with respect to $A\Delta x$. This means the value of fraction
$$\frac{E}{A\Delta x}\ \text{is very small}.$$
[Consider the second goal above], I realize that $\frac{E}{A\Delta x}$ cannot always be very small. What I am looking for is an operation that will make it smaller (or larger) so I can tell someone or a computer what to do (or not to do) to improve the accuracy.
Now as the value of $E$ is dependent on $\Delta x$ for different choices of $x$, the only two options I have would be let $\Delta x \to 0$ or $\Delta x \to \infty$. Because any other options will likely involve letting $\Delta x$ be some kind of complicated function of $x$, and not only it defeats the pupose of linear approximation (for example, assume $\Delta x$ is a parabola function with respect to $x$, then why bother doing linear approximation at the first place ? I should just do a parabolic approximation !) but also this will likely involve more than one operation during approximation (which is not good if there are too many operations a person or a computer needs to carry).
So now I need to evaulate the two operations above:
- It is obvious that I cannot guarantee $\frac{E}{A\Delta x}$ to keep becoming smaller when $\Delta x \to \infty$.
- It seems possible for me to find a $A$ such that the value of $\frac{E}{A\Delta x}$ keeps decreasing when $\Delta x \to 0$.
Which means I should probably look at the following limit
$$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$$
With the above two consideration, I should try to develop the requirement of $A$:
Assume $\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}$ exists, what I want is $\frac{E}{A\Delta x}$ become smaller as $\Delta x \to 0$. This means at best I should expect this limit goes to zero (and the "nicest $A$" should at least satisfies the value of this limit), that is:
$$\lim\limits_{\Delta x \to 0}\frac{E}{A\Delta x}=\frac{1}{A}\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$
$$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0$$
Now I can say $E$ is a higher order inifitesimal of $\Delta x$. I can then express $E$ with the following equation
$$E=\epsilon\Delta x\text{, where} \lim\limits_{\Delta x \to 0}\epsilon=0$$
Substitute $E=\epsilon\Delta x$ back to the equation (1) above, I have
$$f(x)=f(a)+A\Delta x+\epsilon\Delta x$$
$$A+\epsilon=\frac{f(x)-f(a)}{\Delta x}$$
And it is not hard to see that
$$\lim\limits_{\Delta x \to 0}(A+\epsilon)=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$
$$\lim\limits_{\Delta x \to 0}A + \lim\limits_{\Delta x \to 0}\epsilon=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$
$$A=\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$$
Now I can define $A$ to be the derivative, and $A\Delta x$, $\Delta x$ to be the differential. I can also claim for each $(x,f(x))$ in the interval, such $A$ is unique due to the uniqueness nature of limit $\lim\limits_{\Delta x \to 0}\frac{f(x)-f(a)}{\Delta x}$.
(I can also claim now this unique $A$ is the "nicest $A$" because it is the only one that satisfies the least requirement of the "nicest $A$")
Thus I successfuly bring in the concept of derivative and differential by only using the concept of limit and linear approximation. And the function is said to be differentiable when the limit
$$\lim\limits_{\Delta x \to 0}\frac{E}{\Delta x}=0\ \ \text{exists}$$