I have just learned about the chain rule but my book doesn't mention the proof. I tried to write a proof myself but can't write it. So can someone please tell me about the proof for the chain rule in elementary terms because I have just started learning calculus.
4 Answers
Assuming everything behaves nicely ($f$ and $g$ can be differentiated, and $g(x)$ is different from $g(a)$ when $x$ and $a$ are close), the derivative of $f(g(x))$ at the point $x = a$ is given by $$ \lim_{x \to a}\frac{f(g(x)) - f(g(a))}{x-a}\\ = \lim_{x\to a}\frac{f(g(x)) - f(g(a))}{g(x) - g(a)}\cdot \frac{g(x) - g(a)}{x-a} $$ where the second line becomes $f'(g(a))\cdot g'(a)$, by definition of derivative.

- 199,419
-
8There's a more complicated derivation that avoids having to treat separately the case where $g(x)=g(a)$ for $x$ in a neighborhood of $a$. Some authors such as the one quoted here point out an "incorrect argument" that looks like the proof above, but I think the objection applies only if you forget to specify that $g(x)\neq g(a)$. – David K Mar 08 '15 at 13:26
-
8You still need to deal with the case when $g(x) =g(a) $ when $x\to a$ and that is the part which requires some effort otherwise it's just plain algebra of limits. And most authors try to deal with this case in over complicated ways. One just needs to remark that in this case $g'(a) =0$ and use it to prove that $(f\circ g)'(a) =0$. – Paramanand Singh Jun 13 '17 at 10:09
-
@Arthur Is it correct to prove the rule by using two cases. One where the derivative of $g(x)$ is zero at $x$ (and as such the "total" derivative is zero), and the other case where this isn't the case, and as such the inverse of the derivative $1/g'(x)$ exists (the case you presented)? It seems to work, but I wonder, because I haven't seen a proof done that way. – Dole Feb 08 '19 at 20:27
-
@ParamanandSingh Just for completeness, the $g'(a) = 0$ case is written out in detail at https://math.stackexchange.com/a/2492797 – Joshua P. Swanson Oct 17 '23 at 22:27
One approach is to use the fact the "differentiability" is equivalent to "approximate linearity", in the sense that if $f$ is defined in some neighborhood of $a$, then $$ f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}\quad\text{exists} $$ if and only if $$ f(a + h) = f(a) + f'(a) h + o(h)\quad\text{at $a$ (i.e., "for small $h$").} \tag{1} $$ (As usual, "$o(h)$" denotes a function satisfying $o(h)/h \to 0$ as $h \to 0$.)
If $f$ is differentiable at $a$ and $g$ is differentiable at $b = f(a)$, and if we write $b + k = y = f(x) = f(a + h)$, then $$ k = y - b = f(a + h) - f(a) = f'(a) h + o(h), $$ so $o(k) = o(h)$, i.e., any quantity negligible compared to $k$ is negligible compared to $h$. Now we simply compose the linear approximations of $g$ and $f$: \begin{align*} f(a + h) &= f(a) + f'(a) h + o(h), \\ g(b + k) &= g(b) + g'(b) k + o(k), \\ (g \circ f)(a + h) &= (g \circ f)(a) + g'\bigl(f(a)\bigr)\bigl[f'(a) h + o(h)\bigr] + o(k) \\ &= (g \circ f)(a) + \bigl[g'\bigl(f(a)\bigr) f'(a)\bigr] h + o(h). \end{align*} Since the right-hand side has the form of a linear approximation, (1) implies that $(g \circ f)'(a)$ exists, and is equal to the coefficient of $h$, i.e., $$ (g \circ f)'(a) = g'\bigl(f(a)\bigr) f'(a). $$ One nice feature of this argument is that it generalizes with almost no modifications to vector-valued functions of several variables.

- 78,195
-
The way $h, k$ are related we have to deal with cases when $k=0$ as $h\to 0$ and verify in this case that $o(k) =o(h) $. This is not difficult but is crucial to the overall proof. – Paramanand Singh Jun 13 '17 at 10:15
-
1What happens in the third linear approximation that allows one to go from line 1 to line 2? I don't understand where the $o(k)$ goes. – Michael Andrew Bentley Oct 10 '17 at 14:47
-
First, let me give a careful statement of the theorem of the chain rule:
If $g:\mathbb R\to\mathbb R$ is differentiable at $a$, and $f:\mathbb R\to\mathbb R$ is differentiable at $g(a)$, then $f \circ g$ is differentiable at $a$, and $$ (f \circ g)'(a) = f'(g(a)) \cdot g'(a). $$
Now for the proof. We first define an auxiliary function $\phi:\mathbb R\to\mathbb R$ as follows: $$ \phi(t)=\begin{cases} \dfrac{f(t)-f(g(a))}{t-g(a)}&\text{if $t\neq g(a),$} \\[5pt] f'(g(a))&\text{if $t=g(a)$.} \end{cases} $$ By construction, $\phi$ is continuous at $g(a)$. Moreover, $g$ is continuous at $a$ because it is differentiable at $a$. Hence, $\phi \circ g$ is continuous at $a$. We use this below on line $\eqref{*}$.
Note that for all $x\neq a$, $$ \frac{f(g(x))-f(g(a))}{x-a}=\phi(g(x)) \cdot \frac{g(x)-g(a)}{x-a}. $$ (This is true even if $g(x)=g(a)$, as in that case both sides of the equation are equal to $0$.) Hence, \begin{align} (f \circ g)'(a)&=\lim_{x \to a}\frac{f(g(x))-f(g(a))}{x-a} \\[5pt] &= \lim_{x \to a}\phi(g(x)) \cdot \lim_{x \to a}\frac{g(x)-g(a)}{x-a} \\[5pt] &= \phi(g(a)) \cdot g'(a) \tag{*}\label{*} \\[5pt] &= f'(g(a)) \cdot g'(a), \end{align} as claimed.
For an explanation of how this proof was motivated, see chapter 10 of Michael Spivak's Calculus. (My definition of the function $\phi$ is different to Spivak's, though the difference is not major.)
Remark: although we assume that $f$ and $g$ are both maps from $\mathbb R$ to $\mathbb R$, this is mostly just for ease of notation; with a few minor modifications, the same proof works in the case that $g$ is merely assumed to be defined in an open interval containing $a$, and $f$ is defined in an open interval containing $g(a)$.

- 19,636
-
How do you claim that the auxillary function will be continuous at g(a)? – L lawliet Oct 05 '23 at 19:39
-
1@Llawliet: Note that $\phi(g(a))=f'(g(a))=\lim_{t\to g(a)}\frac{f(t)-f(g(a))}{t-g(a)}=\lim_{t\to g(a)}{\phi(t)}$, with the last equality holding because $\frac{f(t)-f(g(a))}{t-g(a)}=\phi(t)$ for all $t\neq g(a)$. Does that help? – Joe Oct 05 '23 at 20:40
As suggested by @Marty Cohen in [1] I went to [2] to find a proof. Under fair use, here I include Hardy's proof (more or less verbatim).
We write $f(x) = y$, $f(x+h) = y+k$, so that $k\rightarrow 0$ when $h\rightarrow 0$ and \begin{align} \label{eq:rsrrr} \dfrac{k}{h} \rightarrow f'(x). \quad \quad Eq. * \end{align} We must now distinguish two cases.
I. Suppose that $f'(x) \neq 0$, and that $h$ is small, but not zero. Then $k\neq 0$ because of Eq.~*, and \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h} &= \dfrac{F(y+k) - F(y)}{k}\dfrac{k}{h} \rightarrow F'(y)\,f'(x) \end{align*}
II. Suppose that $f'(x) = 0$, and that $h$ is small, but not zero. There are now two possibilities
II.A. If $k=0$, then \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&= \frac{F\left\{f(x+h)\right\}-F\left\{f(x )\right\}}{h} \\ &= \frac{F\left\{y\right\}-F\left\{y\right\}}{h} \\ &= \dfrac{0}{h} \\ &= 0 = F'(y)\,f'(x) \end{align*}
II.B. If $k\neq 0$, then \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&= \frac{F\left\{f(x+h)\right\}-F\left\{f(x )\right\}}{k}\,\dfrac{k}{h}. \end{align*} The first factor is nearly $F'(y)$, and the second is small because $k/h\rightarrow 0$. Hence $\dfrac{\phi(x+h) - \phi(x)}{h}$ is small in any case, and \begin{align*} \dfrac{\phi(x+h) - \phi(x)}{h}&\rightarrow 0 = F'(y)\,f'(x) \end{align*}
Bibliography
[2] G.H. Hardy, ``A course of Pure Mathematics,'' Cambridge University Press, 1960, 10th Edition, p. 217.

- 1,100
- 6
- 22