6

Is there a simple and intuitive way to prove the chain rule, that is, if $y$ is a function of $u$ and $u$ is a function of $x$, then why is $\frac{dy}{dx}$ = $\frac{dy}{du}$ $\cdot$ $\frac{du}{dx}$ ? This could just be an intuitive argument.

PS: The only proofs I found were based off of confusing definitions.

tc216
  • 869
  • If you want are real proof, as opposed to just an intuitive explanation, I don't think you can avoid a little bit of technicalities. See this answer, for example: http://math.stackexchange.com/a/1480979/1242. – Hans Lundmark Dec 15 '16 at 07:36
  • The intuitive derivation of the chain rule that I described here can be converted to a rigorous proof in a fairly straightforward way. – littleO Jan 05 '17 at 00:57
  • @HansLundmark Can you check my proof ? please. –  Jan 05 '17 at 07:52

7 Answers7

6

If $\frac{du}{dx}=k\neq 0$ at some $x$, then a first-order (that is: linear) approximation of $du$ close to $x$, gives $$ du=k\cdot dx \Rightarrow \frac{1}{du}=\frac{1}{k\cdot dx} $$ thus: $$ \frac{dy}{du}\cdot\frac{du}{dx}=\frac{dy}{k\cdot dx}\cdot k=\frac{dy}{dx} $$ Intuitively, you should be thinking of differentials as "small changes". So small, that even linear approximation would be good "enough".

KonKan
  • 7,344
4

If you want intuitive and simple:

$$\frac{dy}{dx}=\frac{dy}{\color{#4499de}{du}}\frac{\color{#4499de}{du}}{dx}$$

where the $du$'s cancel out.


If you want to be more rigorous, replace $dy,dx,du$ with $\Delta y,\Delta x,\Delta u$, which is the change with respect to $x$, and take the limit as $\Delta\to0$, which becomes the derivatives.

  • You mean, using differentials? – tc216 Dec 15 '16 at 00:37
  • @jeff Yes? I think its the most obvious way of doing it. – Simply Beautiful Art Dec 15 '16 at 00:38
  • 1
    The second argument is still not rigorous. For a real proof, you need to deal with the problem that $\Delta u$ may equal zero for $\Delta x$ arbitrarily close to zero. – Hans Lundmark Dec 15 '16 at 07:30
  • @HansLundmark chain rule simply does not hold for that case I think. – Simply Beautiful Art Dec 15 '16 at 12:00
  • 3
    Yes, it certainly does! The assumptions for $f \circ g$ to be differentiable at a point $a$ are just that $f$ is differentiable at $g(a)$ and $g$ is differentiable at $a$. (If we in the definition of “differentiable” also include that the function should be defined in an interval around the point in question.) But the proof, under these assumptions only, is a bit trickier than one might expect. – Hans Lundmark Dec 15 '16 at 13:15
  • @SimpleArt If $\frac{du}{dx}$ exists and $\Delta u = 0$ for values of $\Delta x$ arbitrarily close to zero, then $\frac{du}{dx}=0$ and the chain rule gives $\frac{dy}{dx}=0$. An unsatisfying thing about the standard proof of the chain rule is that this case has to be treated separately. – Ian Dec 15 '16 at 14:35
  • @HansLundmark thanks for pointing that out. – Simply Beautiful Art Dec 15 '16 at 14:57
  • @Ian but I do think one might have $dy/du$ be undefined. – Simply Beautiful Art Dec 15 '16 at 14:59
  • @SimpleArt Nope; this is a flaw in the differential notation. In function notation the chain rule takes the less intuitive form $(f \circ g)'(x)=f'(g(x)) g'(x)$. The expression "$f'(g(x))$" couldn't care less what $g'(x)$ is; to that expression, the only thing that matters is the value of $g(x)$. Intuitively, $dy/du$ is still whatever it would be if $u$ changed, but as it happens $u$ isn't changing. – Ian Dec 15 '16 at 15:22
3

Some intuition: If $f(x) = m_1x + b_1, g(x) =m_2x + b_2,$ then $(g\circ f)(x) = m_2m_1x + (m_2b_1 + b_2).$ So in the case of linear functions, the slope of their composition is the product of the slopes. Now if $f,g$ are differentiable at $a, f(a)$ respectively, we can expect that, near $a,$ $g\circ f$ is very close to the composition of their tangent lines. Thus the slope of their composition at $a$ should be the product of the two slopes, i.e., $(g\circ f)'(a) = g'(f(a))\cdot f'(a).$

zhw.
  • 105,693
2

This answer is more intuition than an actual proof, but it may be helpful if you're learning the chain rule. The derivative of $y$ with respect to $x$ tells you how fast $y$ is changing as $x$ changes. If $x$ is changing 3 times as fast as $t$, and $y$ is changing 2 times as fast as $x$, then $y$ is changing 6 times as fast as $t$. This is not a proof but it gives you an idea of why it should be true. In the example above $dy/dx$ and $dx/dt$ are both constant, so $y(x)$ and $x(t)$ are linear functions, but for each value of $t$ and $x$, the graphs of $x(t)$ and $y(x)$ are "basically" lines.

1

The best intuition, in my opinion, comes from the notion of a differential. To each scalar variable $v$, there is a corresponding differential $\mathrm{d}v$.

Among the things you can do with differentials are:

  • Add them
  • Multiply a differential by a scalar

If it turns out two differentials are related by an equation $$\mathrm{d}v = w \, \mathrm{d}u $$ then $w$ is* determined this equation, and it makes sense to define the ratio $$ \frac{\mathrm{d}v}{\mathrm{d}u} = w$$

Of course, if $f$ is differentiable, then we have related differentiables $$\mathrm{d}(f(u)) = f'(u) \, \mathrm{d}u $$

If all of the ratios involved are defined, we can compute

$$ \frac{\mathrm{d}y}{\mathrm{d}x} \, \mathrm{d}x = \mathrm{d}y = \frac{\mathrm{d}y}{\mathrm{d}u} \mathrm{d}u = \frac{\mathrm{d}y}{\mathrm{d}u} \frac{\mathrm{d}u}{\mathrm{d}x} \mathrm{d}x$$

and conclude

$$ \frac{\mathrm{d}y}{\mathrm{d}x} = \frac{\mathrm{d}y}{\mathrm{d}u} \frac{\mathrm{d}u}{\mathrm{d}x} $$

*: There are caveats involved; e.g. this equation doesn't tell us anything about $w$ in a region where $u$ is locally constant (and so $\mathrm{d}u = 0$)


This sort of calculation only works well in one dimension; e.g. where all of the variables involved are related. When you have multiple independent variables, differentials still make sense, but ratios usually don't.

For example, on the plane, $\mathrm{d}x$ and $\mathrm{d}y$ are both well-defined, but neither is a scalar multiple of the other; one can't make sense out of "dividing": by $\mathrm{d}x$, except in special cases.

0

$\frac {f(g(x))}{d(g(x))}*\frac{d(g(x))}{dx}=$

$\lim_{H_h = g(x+h)- g(x)\rightarrow 0} \frac {f(g(x) + H_h) -f(g(x)) }{H_h}*\lim_{h= x+h - x = h\rightarrow 0} \frac{g(x+h) - g(x)}h$

$= \lim_{h\rightarrow 0}\frac {f(g(x) + H_h) -f(g(x)) }{H_h}*\lim_{h\rightarrow 0} \frac{g(x+h) - g(x)}h$

$= \lim_{h\rightarrow 0}\frac {f(g(x) + g(x+h) - g(x)) -f(g(x)) }{H_h}* \frac{g(x+h) - g(x)}h$

$= \lim_{h\rightarrow 0}\frac {f(g(x+h)) -f(g(x)) }{H_h}* \frac{H_h}h$

$= \lim_{h\rightarrow 0}\frac {f(g(x+h)) -f(g(x)) }h$

$= \frac {d(f(g(x))}{dx}$

(See comments. If $H_h = g (x+h)-g(x)=0$ for all $0 < h < \delta $ for some $\delta $ then those limits don't really work.)

(But then $g'(x) = \lim\frac {g (x+h)-g (x)}h=\lim \frac 0h =0$ and $g (x)=g (x+h) $ so $\frac {d(f (g))}{dx}=\lim \frac {f (g (x+h))-f (g (x))}h=\lim\frac {f (g(x))-f (g(x))}h=\lim \frac 0h=0 = \frac {dfg (x)}{dg (x)}*0=\frac {dfg (x)}{dg (x)}*\frac {dg (x)}{dx} $ so that's trivial.)

===

This is highly abusive, but for small values of $h \ne 0 $:

$f'(x)\approx \frac {f (x+h)-f (x)}h $ so

$hf'(x)\approx f (x+h) - f (x) $

$f (x)\approx f (x+h)-hf'(x) $ and

$f (x+h)\approx f (x)+hf'(x)$

and thus all hold true even for $h=0$ (but they are rough)

And so for small values of $h $ and small values of $k=hg'(x) $

$h(f\circ g)'(x)\approx$

$f (g (x+h))-f (g (x))\approx $

$f (g (x)+hg'(x))-f (g (x))\approx $ (for small $k=hg'(x) $

$kf'(g (x))\approx $

$hg'(x)f '(g (x))$

So $(f\circ g)'(x)\approx g'(x)f' (g (x)) $.

Lots more rigor is required to make those valid limit expressions ($h $ can't equal $0$ but $g'(x) $ might, etc.) but that is intuitive.

fleablood
  • 124,253
0

I think this proof is intuitive.

$$g(x) = G(f(x))$$

let $y = G(t)$, $t = f(x)$ and $y = g(x)$

$\Delta y, \Delta x ,\Delta t , \varepsilon$ are infinitesimals.

By increment theorem, $$\Delta y = G^\prime(t)\Delta t + \varepsilon\Delta t$$ Dividing by $\Delta x$ $${\Delta y \over \Delta x} = G^\prime(t){\Delta t\over \Delta x} + \varepsilon{\Delta t\over \Delta x}$$ Taking standard part, $$st\left({\Delta y \over \Delta x}\right) = G^\prime(t)\left({\Delta t\over \Delta x}\right) $$ $${\mathrm dy \over \mathrm d x}= G^\prime(t)\left({\mathrm dt\over \mathrm dx}\right) $$ $$\bbox[#F85, 5px, Border: 3px solid green] {g^\prime(x)= G^\prime(f(x))f^\prime (x)} $$