Why do differentiation rules work? What's the intuition behind them? (Not asking for proofs)

Question

Differentiation rules have been bugging me ever since I took Basic Calculus. I thought I'd develop some intuitive understanding of them eventually, but so far all my other math courses (including Multivariable Calculus) take the rules for granted.

I know how to prove some of the rules. The problem is that algebra manipulation alone isn't quite convincing to me. Is there any possibility of understanding why the algebra happens to work that way? For example, why do the slopes of the tangent line to the parabola x^2 happen to be determined by 2x? Looking at it graphically, there's no way I could've told that.

Any sources covering this issue (books; internet sites; etc) would be very greatly appreciated. Thanks in advance.

For the basic formulas such as derivative of $x^2$ is $2x$, you can always try to calculate using the definition $$\lim_{h\rightarrow 0} \frac{(x+h)^2 - x^2}{h}.$$ — Xiao, Oct 02 '15 at 16:07
@HagenvonEitzen Actually it can...intuition is an important part in how proofs are made. — Zach466920, Oct 02 '15 at 16:43
@Matt24 Would you be convinced that the derivative of $ax^2+bx+c$ is $2ax+b$ without the use of the limit definition of the derivative by just using non calculus math material? — imranfat, Oct 02 '15 at 17:00
@imranfat Newton didn't have the limit definition, and he was convinced. :) People were doing calculus for over a century before they had a definition of limit. — Thomas Andrews, Oct 02 '15 at 17:11
For $x^2$ (or, more generally, $ax^2+bx+c$), you might be interested in reading this answer to the seemingly-unrelated "Derivation of the formula for the vertex of a parabola". — Blue, Oct 02 '15 at 17:14
You might want to add the (soft-question) tag to this question. — Math1000, Oct 02 '15 at 20:00
@Zach466920 Does intuition creating motivation for a new proof make the intuition more convincing than the proof? Nothing can be more convincing than a proof. A proof ultimately removes any other possibilities of the truth. — user236182, Oct 03 '15 at 12:43
@user236182 Unless it's the continuum hypothesis, godels theorem, etc... — Zach466920, Oct 03 '15 at 13:00
The reason we have any notion of calculation is because we can't look at most problems and just know the answer. — , Oct 03 '15 at 13:10
@ThomasAndrews. Yes absolutely, but that was our Big Newton... but my point was that indeed a lot of calculus can be done without the tools we now call calculus...Finding the "derivative" of a quadratic is one (as I mentioned above), finding the area inside the parabola another (Archimedes) and more... — imranfat, Oct 03 '15 at 14:23
@ThomasAndrews Yeah, but I suppose Newton's opinion was more guided by $(x+h)^2=x^2+2hx+h^2$ than by geometric constructions involving the tangent to a parabola as obtained from the intersection of a cone and an ax-parallel plane — Hagen von Eitzen, Oct 04 '15 at 09:56

score 64 · Answer 1 · edited Oct 03 '15 at 02:36

The key intuition, first of all, is that the product of two tiny differences is negligible. You can intuit this just by doing computations:

$$3.000001 \cdot 2.0001 = 6.0003020001$$

If we are doing any sort of rounding of hand computations, we'd likely round away that $0.0000000001$ part. If you were doing computations to eight significant digits, a value $v$ is really a value in a range roughly of $v\left(1 \pm 10^{-8}\right)$ and the error when you multiply $v_1$ by $v_2$ is almost entirely $10^{-8}|v_1v_2|$. The other part of the error is so tiny you'd probably ignore it.

Case: $f(x)=x^2$

Now, consider a square with corners $(0,0), (0,x), (x,0), (x,x)$. Grow $x$ a little bit, and you see the area grows by proportionally by the size of two of the edges, plus a tiny little square. That tiny square is negligible.

This is a little harder to visualize for $x^n$, but it actually works the same way when $n$ is a positive integer, by considering an $n$-dimensional hypercube.

This geometric reason is also why the circumference of a circle is equal to the derivative of its area – if you increase the radius a little, the area is increased by approximately that "little" times the circumference. So the derivative of $\pi r^2$ is the circumference of the circle, $2\pi r$.

It's also a way to understand the product rule. (Or, indeed, FOIL.)

Case: The chain rule

The chain rule is better seen by considering an odd-shaped tub. Let's say that when the volume of the water in a tube is $v$ then the tub is filled to depth $h(v)$. Then assume that we have a hose that, between time $0$ and time $t$, has sent a volume of $v(t)$ water.

At time $t$, what is the rate that the height of the water is increasing?

Well, we know that when the current volume is $v$, then the rate at which the height is increasing is $h'(v)$ times the rate the volume is increasing. And the rate the volume is increasing is $v'(t)$. So the rate the height is increasing is $h'(v(t)) \cdot v'(t)$.

Case: Inverse function

This is the one case where it is obvious from the graph. When you flip the coordinates of a Cartesian plane, a line of slope $m$ gets sent to a line of slope $1/m$. So if $f$ and $g$ are inverse functions, then the slope of $f$ at $(x,f(x))$ is the inverse of the slope of $g$ at $(f(x),x)=(f(x),g(f(x)))$. So $g'(f(x))=1/f'(x)$.

$x^2$ revisited

Another way of dealing with $f(x)=x^2$ is thinking again of area, but thinking of it in terms of units. If we have a square that is $x$ centimeters, and we change that by a small amount, $\Delta x$ centimeters, then the area is $x^2\mathrm{cm}^2$ and it goes to approximately $f(x+\Delta x)-f(x)=f'(x)\Delta x$.

On the other hand, if we measure the square in meters, it has side length $x/100$ meters and area $(x/100)^2$. The change in the side length is $(\Delta x)/100$ meters. So the expected area change is $f'(x/100)\cdot (\Delta x)/100$ square meters. But this difference should be the same, so $$f'(x)\Delta x = f'(x/100)\cdot\frac{\Delta x}{100}\cdot \left(100^2 \text{m}^2/\text{cm}^2\right) = 100 f'(x/100)$$

More generally, then, we see that $f'(ax)=af'(x)$ when $f(x)=x^2$ by changing units from centimeters to a unit that is $1/a$ centimeters.

So we see that $f'(x)$ is linear, although it doesn't explain why $f'(1)=2$.

If you do the same for $f(x)=x^n$, with units $\mu$ and another unit $\rho$ where $a\rho = \mu$, then you get that the a change in volume when changing by $\Delta x\,\mu$ is $f'(x)\Delta x\,\mu^n$. It is also $f'(ax)\cdot a(\Delta x)\,\rho^n$. Since $\mu/\rho = a$, this means $f'(ax) =a^{n-1}f'(x)$.

Again, we still don't know why $f'(1)=n$, but we know $f'(x)=f'(1)x^{n-1}$.

The chain rule can also be intuited from canceling differentials (Leibniz's notation practically begs for this), but then you have to explain what a differential is and why you're allowed to cancel them without e.g. having to use L'Hospital's rule. — Kevin, Oct 03 '15 at 22:32
What is FOIL? I've never encountered that acronym (I assume it is one, as it's all uppercase). — celtschk, Oct 04 '15 at 06:29
FOIL is a high school algebra acronym, but fairly common in the US. It describes how to multiply out $(a+b)(c+d)$ - "First, Outer, Inner, Last." - to $ac+ad+bc+bd$. @celtschk — Thomas Andrews, Oct 04 '15 at 15:39
@Kevin The other difficulty (which is actually reflected in the proof) is that you must show that if $f$ is differentiable at $g(x)$ and $g$ is differentiable at $x$ with $g'(x)=0$, then $(f \circ g)'(x)=0$. In a sense this is separate from the rest of the proof of the chain rule; it just happens to be consistent with the rest of it. — Ian, Oct 11 '15 at 16:24

score 23 · Answer 2 · answered Oct 02 '15 at 17:57

For the first hundred years or so, before people formalized differentiation and integration by using limits, the general intuition behind taking the derivative of $f(x)$ was, "Let's add a tiny increment to $x$ and see how much $f(x)$ changes."

The "tiny increment" was called $o$ (lower-case letter O), at least by some people.

For $f(x) = x^2$, for example, you could show that $$f(x + o) = (x + o)^2 = x^2 + 2xo + o^2 = f(x) + 2xo + o^2.$$ So the amount of "change" in $f(x)$ is $2xo + o^2$, which is $2x + o$ times the amount by which you changed $x$. And then the mathematicians would say that only the $2x$ part of $2x + o$ matters, since $o$ is "vanishingly" small.

I think for most of the differentiation rules developed back then (which may be all you'll see in the table of derivatives in an elementary calculus book), the intuition was to do the arithmetic. What they did not do was to encumber that arithmetic with all the extra mechanisms needed to establish a limit, as the standard-analysis approach does today.

On the other hand, the arithmetic usually went hand-in-hand with practical problems (usually in what we would consider physics or engineering) that people wanted to solve. People also tended to make a connection between arithmetic and geometry, so linking the function $f(x) = x^2$ to the area of a square of side $x$ would have been an obvious thing to do (and the visualization in Thomas Andrews's answer would have worked very well, I think).

For example, visualize a particle running along a circular track at a constant speed. In fact, make the circular track be the circle given by $x^2 + y^2 = 1$ in the Cartesian plane. (Putting everything into Cartesian coordinates was all the rage when calculus was young.) You can then see (by symmetry, or by other arguments) that the direction the particle is going is always perpendicular to the direction in which the particle lies from the center of the circle at that moment. So if the angle to the particle at that instant is $\theta$, the $x$-coordinate of the particle is $\sin\theta$, but the velocity vector is pointing in a direction $\frac\pi2$ radians "ahead" of $\theta$, and if we let $\theta$ increase at the rate of $1$ radian per unit of time the magnitude of the velocity is $1$, so its $x$-coordinate is $\sin\left(\theta + \frac\pi2\right) = \cos\theta$, which is the derivative of $\sin\theta$ when $\theta$ is measured in radians.

score 13 · Answer 3 · answered Oct 02 '15 at 20:10

13

Derivative is the study of linear approximation. For example, $$ (x+\delta)^{2}=x^{2}+2x\delta + \delta^{2}. $$ The linear term has slope $2x$ at $x$, which is the coefficient of the term that linear in $\delta$. The linear term is the derivative: $$ f(x+\delta) = f(x)+f'(x)\delta+\mbox{higher order $\delta$ terms} $$ So, for example, the derivative of $fg$ is obtained by finding the linear terms in \begin{align} (fg)(x+\delta) &=\{ f(x)+f'(x)\delta+\cdots\}\{ g(x)+g'(x)\delta+\cdots\} \\ & = f(x)g(x)+\{f(x)g'(x)+f'(x)g(x)\}\delta+\cdots \end{align} $$ \implies (fg)'(x)=f(x)g'(x)+f'(x)g(x). $$

answered Oct 02 '15 at 20:10

Disintegrating By Parts

87,459
5
65
149

3

I think this is the first answer to explain the product rule. Note that the power rule $\frac{d}{dx}x^n = nx^{n-1}$ can be derived by induction on the product rule with $f(x) = x^{n-1}$ and $g(x) = x$. – David K Oct 03 '15 at 15:04
3

I think the best linear approximation approach is definitely the best in providing a path into more abstract usages; and into differential geometry. – rrogers Oct 07 '15 at 12:54

Nick Alger · Answer 4 · 2015-10-02T22:04:27.143

The intuition for this bothered me for a while too when I first learned about it. The standard argument based on limits and thinking about small changes seemed very mechanical and lacking in insight.

Since the chain rule seems very intuitive to me, what finally satisfied me was the following argument (requires a very small amount of multivariable calc/linear algebra), $$\text{multidimensional chain rule} \implies (\text{derivative of }x^2) =2x.$$ Specifically, take the following functions $g:\mathbb{R}\rightarrow \mathbb{R}^2$, and $f:\mathbb{R}^2 \rightarrow \mathbb{R}$ such that $g$ lifts $x$ into 2 dimensions by making a copy of it, then $f$ brings it back down to one dimension by multiplying the two copies, \begin{align} g(x) &:= \begin{bmatrix}x \\ x\end{bmatrix}, \quad\quad g'(x) = \begin{bmatrix}1 \\ 1\end{bmatrix} \\ f(x,y) &:= x\cdot y, \quad\quad f'(x,y) = \begin{bmatrix}y & x\end{bmatrix} \end{align} The composition of these functions is the 1D function we want, $$f(g(x)) = x^2.$$ By the chain rule, the derivative of the composition is the composition of the derivatives, which is, $$(f \circ g)'(x) = f'(x,x) \circ g'(x) = \begin{bmatrix}x & x\end{bmatrix}\begin{bmatrix}1 \\ 1\end{bmatrix} = 2x.$$

The same technique (lifting to higher dimensions + chain rule) also explains the product rule in general.

Using concepts from a more advanced level of calculus to find a derivative rule most students would have learned at least a year earlier ... I like it. — David K, Oct 03 '15 at 15:00

score 6 · Answer 5 · answered Oct 02 '15 at 22:03

$$x^n = \underbrace{x\times x\times\cdots\times x}_{\text{n factors}}$$

If you replace $x\longrightarrow x + dx$, and work out the product, then the term proportional to $dx$ will be $n x^{n-1}dx$ because if you pick a $dx$ from a factor you can't pick $x$ from there anymore and there are $n$ places you can choose to pick your $dx$ term from.

score 4 · Answer 6 · edited Jun 12 '20 at 10:38

Matt24, it's sad that the world of mathematics has come to a point where many mathematicians, asked for the intuition behind a certain principle in math, respond in the majority with a chalkboard or a textbook chapter full of symbols. ;) Symbols are fine, but they don't substitute for an understanding; you can only successfully symbolize something if you understand it first.

I don't think I can improve on the elegance and the simplicity of Thompson's intuitive explanations of Calculus. This is a very old book, but it's still the best textbook I've ever seen on Calculus. The beginning of Chapter IV precisely answers this question. (Book pages 18 and 19—but they're pages 32 and 33 on the PDF, because of the table of contents et. al.) http://djm.cc/library/Calculus_Made_Easy_Thompson.pdf

(In the excerpt, I am using ^ for exponentiation and . for multiplication.)

Let us begin with the simple expression y = x^2. Now remember that the fundamental notion about the calculus is the idea of growing. Mathematicians call it varying. Now as y and x^2 are equal to one another, it is clear that if x grows, x^2 will also grow. And if x^2 grows, then y will also grow. What we have got to find out is the proportion between the growing of y and the growing of x. In other words our task is to find out the ratio between dy and dx, or, in brief, to find the value of dy/dx.

Let x, then, grow a little bit bigger and become x+dx; similarly, y will grow a bit bigger and will become y+dy. Then, clearly, it will still be true that the enlarged y will be equal to the square of the enlarged x. Writing this down, we have:

y+dy = (x+dx)^2.

Doing the squaring we get:

y+dy = x^2 + 2x.dx + (dx)^2

What does (dx)^2 mean? Remember that dx meant a bit—a little bit—of x. Then (dx)^2 will mean a little bit of a little bit of x; that is, as explained above (p. 4), it is a small quantity of the second order of smallness. It may therefore by discarded as quite inconsiderable in comparison with the other terms. Leaving it out, we then have:

y+dy = x^2 + 2x.dx

Now y=x^2; so let us subtract this from the equation and we have left

dy=2x.dx

Dividing across by dx, we find

dy/dx = 2x.

Now this is what we set out to find. The ratio of the growing of y to the growing of x is, in the case before us, found to be 2x.

It seems odd to complain about people writing "a page full of symbols", and then as an example of the way it should be done, presenting a passage with a symbol on almost every line, sometimes quite a lot of symbols. The ideas in the answer had been offered already, by the way, although I agree that Thompson expresses them beautifully. (I would say he uses just enough symbols to capture the historical intuitions of the calculus.) — David K, Oct 03 '15 at 14:47
A fair point! In actual fact, the answers posted here are all fairly good. Calculus textbooks and teachers' explanations I have seen often leave a lot to be desired...letting the symbols stand on their own without any explanation. I agree completely with your last statement, that Thompson uses just enough symbols. And I've edited the offending sentence. :) — Wildcard, Oct 03 '15 at 21:21

score 2 · Answer 7 · answered Jul 26 '17 at 17:41

2

If you look for intuition, I cannot recommend the series "Essence of Calculus" by 3Blue1Brown enough!

answered Jul 26 '17 at 17:41

flawr

16,533
5
41
66

fleablood · Answer 8 · 2015-10-02T17:44:29.210

Okay, I'm not sure if this is what you are asking but this was my intuition when I was a calculus student:

A deriviative is a formula for the rate of change at various points of the function. (Assuming the function doesn't jump about or veer sharply.)

The rate of change is the slope of a tangent line to the function at a point.

We find the slope of a line by taking two points and finding the fraction of "the rise over the run".

We don't know the slope of tangent line but if we take two points of the function we can find the slope of that line.

As these two points get really close together so that they actually are the same point That will be the tangent. (Formally, this is ... mush. If they are the same point they aren't two points but one and the "rise over the run" will be 0/0 but just before that point they'll be two point and that slope will be really, really, really close to the slope of the tangent line.

So the slope of the tangent line is the rise over the run of these two close points... or in other words $\lim \frac{f(x) - f(y)}{x - y} $ as x and y get really close together.

Well, replace y with x + h and this is $\lim \frac{f(x) - f(x+h)}{h}$. The ol' definition for the derivative.

======

Or. What is the derivative of $x^2$ at x? That is "fast" the function is growing at x. So how much bigger is $(x + h)^2$ than $x^2$, well as $(x + h)^2 = x^2 + 2hx + h^2$. So the function has gotten $2hx + h^2$ bigger. The $h^2$ is negligible so in essence it got $2hx$ bigger. How "long" did it take to get this much bigger? Well, it did it in $h$ units. So it got that much bigger at a rate of $2hx$ units per $h$ units or simply $2x$.

score 1 · Answer 9 · answered Oct 02 '15 at 18:17

It is wonderful that you are wondering about this. Many people just solve problems by applying formulas automatically without caring about "how" or the history of it all. In fact the history is very interesting too.

Now, your question is broad. For example the part "The problem is that algebra manipulation alone isn't quite convincing to me. Is there any possibility of understanding why the algebra happens to work that way?" is addressed in the principles behind Calculus. Most Calculus books spend some effort on the basics but at the end of the day, the final laws are what get used in practice. Math. students (at least) get to study subjects such as Real analysis and "Analysis of Complex Variables". Such subjects focus on the science behind those magical formulas. You are correct in finding that intuition is not good enough for all this stuff. While you could obtain books in "Analysis" and review such concepts, they are usually written for advanced learners and may not be easy to digest. A good Calculus book should cover the essential concepts well.

As for your point "why do the slopes of the tangent line to the parabola x^2 happen to be determined by 2x?" - A good discussion with pictures can be found here: finding the tangent of a parabola algebraically.

score 1 · Answer 10 · answered Oct 04 '15 at 00:10

The rules of differentiation can all be derived from the definition $$ \frac{{\rm d} }{{\rm d}x} y(x) =\lim_{h\rightarrow0} \frac{1}{h}( y(x+h)-y(x)) $$

Instead of trying to interpret the argebraic results of the rules (ie. why is there a $2$ in front of the derivative of $x^2$) try to develop a geometric sense for the derivatives.

From Derivative ≡ Slope interpretation you notice things like scalar multiple of functions has a slope with the same scalar factor, and the slope of an even function is an odd function (and vise versa). Try not to focus why the rules are what they are, but on what they mean.

The rules are just a tool ( a shortcut if you might ) to spare you from doing the above limit every time. Some of the rules themselves can be derived from induction from more basic rules (like for $x^n$) whilst trying to derive them from the limit is just a lot more complex.

Brian Tung · Answer 11 · 2015-10-02T21:33:13.067

Some visual images:

For $\frac{d}{dx} x^k = kx^{k-1}$:
- A square of side $x$ has one corner fixed at the origin. The square grows to the upper right, by an amount proportional to the length of the upper and right sides, whose combined lengths are $2x$.
- A cube of side $x$ has one corner fixed at the origin. The cube grows to the back upper right, by an amount proportional to the areas of the upper, right, and back faces, whose combined areas are $3x^2$.
- And so on$\ldots$
For $\frac{d}{dx} \sin x = \cos x, \frac{d}{dx} \cos x = -\sin x$: The values of sin and cos for any $x$ are represented by a point on the unit circle at argument (that is, angle) $x$. The direction of change, however, is the tangent to that point in the counter-clockwise direction, which equals the sin and cos of a point $\pi/2$ radians on. So $\frac{d}{dx} \sin x = \sin (x+\pi/2) = \cos x$, and $\frac{d}{dx} \cos x = \cos (x+\pi/2) = -\sin x$.
For $\frac{d}{dx} e^x = e^x$: I'd love to come up with a transferable visceral intuition for this. I'll have to give this more thought. ETA: Best I can come up with is the usual compound interest argument. Imagine I have $\$100$ invested in an account earning $100$ percent interest. (I said "imagine".) If it's compounded annually, then I just end up with $\$100$, times $1+1 = 2$, or $\$200$. If it's compounded semi-annually, then I end up with $\$100$, times $1+1/2 = 3/2$, times $3/2$ again, or $\$225$. If it's $k$ times a year, then it's $\$100$, times $1+1/k$, times $1+1/k$, etc., a total of $k$ times, or $\$100$, times $(1+1/k)^k$. The limit, as $k \to \infty$ is $\$100$, times $e$. So $\frac{dy}{dx} = y$ leads to $y(x+1) = e \times y(x)$, or $y(x) = Ce^x$.

The function $y(x) = e^x$ is defined by $y' = y$ (and $y(0) = 1$, for normalization). Of course, that's not the only possible definition, but it's a useful and convenient one. — anomaly, Oct 02 '15 at 19:05
Oh, yes. But that just flips the question on its head: If you treat the property of its derivative equaling itself as the definition, then why does it come out as an exponential function with that particular base? — Brian Tung, Oct 02 '15 at 19:08
Uniqueness: The function $\tilde y(x) = y(x + t)/y(t)$ also satisfies $\tilde y' = \tilde y$ and $\tilde y(0) = 1$, so $y(x + t) = y(x)y(t)$. (One has to be a bit careful about showing that the solutions are defined for all time, are positive, etc.) — anomaly, Oct 02 '15 at 19:11

user2958652 · Answer 12 · 2015-10-03T23:43:42.113

Chain Rule: We want the derivative of $f(g(x))$ with respect to $x$. Derivatives, of course, are taken by minuscule changes in the target variable ($\frac{df(x)}{dx}$ is small change in rise over small change in run). We assume that $g(x)$ is continuous and differentiable, and thus a small enough change in $x$ will result in a small change in $g(x)$. We can then treat $g(x)$ as a variable, letting $u=g(x)$, then $\frac{df(u)}{dx} = \frac{df(u)}{dx} \cdot \frac{du}{du} = \frac{df(u)}{du} \cdot \frac{dg(x)}{dx}$.

Rule that $\frac{d}{dx}x^2=2x$: This one can be related to simple geometry by the fundamental theorem of calculus (FTC). The area of a right triangle is $\frac{1}{2}bh$. Graphically, this is interpreted as $\frac{1}{2}xf(x)$. Recall that $f(x)$ is a linear function, i.e. $f(x)=mx$ with slope $m=\frac{h}{b}$. In other words, the area of the triangle formed under linear function $f(x)=mx$ is given by $\frac{1}{2}xf(x) = \frac{1}{2}mx^2$. By the FTC, the derivative of $\frac{1}{2}mx^2$ is the function $mx$. The generalized rule, $\frac{d}{dx}x^n = nx^{n-1}$, is best understood algebraically (as shown in the other answers).

Inverse function rule: Nicely understood graphically, link with picture and link with picture. So basically $\frac{d}{dx}f^{-1}(x) = \frac{1}{f'(f^{-1}((x))}$. If you know that the derivative of $f(x)=e^x$ is $e^x$ then you can use the inverse rule to derive the derivative of $ln(x)$:

$$\frac{d}{dx}ln(x) = \frac{1}{e^{ln(x)}} = \frac{1}{x}$$

It's not graphical, but you can use logarithms for a short pseudoproof of the power rule:

$$y=x^n \Rightarrow ln(y)=n ln(x) \Rightarrow \frac{y'}{y}=\frac{n}{x} \Rightarrow y'=n\frac{x^n}{x} \Rightarrow y'=nx^{n-1}$$

Basic rules: Some of the most basic rules (addition rule, product rule, quotient rule) are difficult to understand graphically, but easily follow algebraically from the definition of derivative and integral.

Also, with the product rule you can derive the (integer) power rule: Assume we know the derivative of $x^n$ is $nx^{n-1}$ and that we know the derivative of $x$ is $1$. Well $x^{n+1}=x\cdot x^n$, so $$\frac{d}{dx}x^{n+1} = \frac{d}{dx}(x\cdot x^n) = x^n\frac{d}{dx}x+x\frac{d}{dx}x^n = x^n+x\cdot nx^{n-1} = (1+n)x^n$$

But once again, the algebraic explanation that was given in other answers $$\lim_{\delta \rightarrow 0}\frac{(x+\delta)^n - x^n}{\delta} = \frac{(x^n+{n\choose 1}x^{n-1}\delta+{n\choose 2}x^{n-2}\delta ^2+\ldots)-x^n}{\delta} = \frac{{n\choose 1}x^{n-1}\delta+{n\choose 2}x^{n-2}\delta ^2+\ldots}{\delta} = {n\choose 1}x^{n-1}+{n\choose 2}x^{n-2}\delta+\ldots = {n\choose 1}x^{n-1} = nx^{n-1}$$ is a pretty good explanation.

score 0 · Answer 13 · answered Oct 03 '15 at 23:38

All of the derivative rules come from looking at the corresponding linear approximations.

At a point $p$, $f(x) \approx f(p)+f'(p)(x-p)$, and $g(x) \approx g(p)+g'(p)(x-p)$.

So it makes sense that the sum of the functions is well approximated by the sum of the tangent lines, and thus that the slopes sum.

Constant multiples work via the same reasoning.

The chain rule works like this:

$\begin{align*} f(g(x)) &\approx f(g(p)+g'(p)(x-p)) \text{ using linear approx of $g$ at $p$}\\ &\approx f(g(p))+f'(g(p))g'(p)(x-p) \text{ using linear approx of $f$ at $g(p)$} \end{align*}$

so the slope of $f \circ g$ at $p$ should be $f'(g(p))g'(p)$.

The only rules which does not work like this ("Tangent line to sum is sum of tangent lines", "tangent line to composition is composition of tangent lines", etc) is the product rule.

The problem is, when we multiply tangent lines we get a parabola, not a line. But it is okay, because we take the tangent line to that parabola in a fairly intuitive way:

$\begin{align*} f(x)g(x) &\approx (f(p)+f'(p)(x-p))(g(p) + g'(p)(x-p))\\ &=f(p)g(p)+(f'(p)g(p)+f(p)g'(p))(x-p)+f'(p)g'(p)(x-p)^2 \end{align*}$

But it is visually obvious that $(x-p)^2$ has zero slope at $p$, so the tangent line must just be $y=f(p)g(p)+(f'(p)g(p)+f(p)g'(p))(x-p)$. This yields the product rule.

Interestingly, this approach sheds new (?) light on the derivative of $f(x) =x^2$. Namely

\begin{align*} x^2 &= \left((x-p)+p\right)^2\\ &=p^2+2p(x-p)+(x-p)^2 \end{align*}

Since $(x-p)^2$ surely has zero slope at $p$, we can see that the slope of $f(x) = x^2$ at $p$ should be $2p$.

Why do differentiation rules work? What's the intuition behind them? (Not asking for proofs)

13 Answers13

Linked

Related