4

Let $f$ and $g$ two differentiable functions on $]a, b[$. Then $f \circ g$ is differentiable on the same interval and we have the expression :

$$(f \circ g)' = g' \cdot f' \circ g$$

How do you prove this ?

dtldarek
  • 37,381
Cydonia7
  • 891

6 Answers6

6

If you are familiar with o-notation, here is an alternative proof: Note that when $f$ is differentiable at $x$, then $$f(x+h)=f(x)+hf'(x)+o(h)\qquad\text{as } h\to0,$$ and this can be used as the definition of $f'(x)$. Now $$\begin{align*} (f\circ g)(x+h) &=f\bigl(g(x+h)\bigr)=f\bigl(g(x)+hg'(x)+o(h)\bigr)\\ &=f\bigl(g(x)\bigr)+\bigl(hg'(x)+o(h)\bigr)f'\bigl(g(x)\bigr)+o\bigl(hg'(x)+o(h)\bigr)\\ &=f\bigl(g(x)\bigr)+hg'(x)f'\bigl(g(x)\bigr)+o(h),\end{align*}$$ and the proof is complete.

If this makes no sense to you at present, come back to it after you have learned about o-notation, and you will appreciate it much more.

  • This proof seems really nice. In fact, I tried it this way... But how do you expand $f\bigl(g(x)+hg'(x)+o(h)\bigr)$ ? (without any hypothesis on what function $f$ is) – Cydonia7 Jun 19 '12 at 15:38
  • @Skydreamer: We do have a have a hypothesis on $f$, namely, its differentiability at $g(x)$. So just replace the $x$ in the first display of my answer with $g(x)$, and $h$ with $hg'(x)+o(h)$, and you're done. – Harald Hanche-Olsen Jun 19 '12 at 15:58
4

\begin{align} (f(g(x)))' &= \lim \frac{f(g(x+\epsilon))-f(g(x))}{\epsilon} \\[8pt] &= \lim \frac{f(g(x+\epsilon))-f(g(x))}{\epsilon} \cdot \frac{g(x+\epsilon)-g(x)}{g(x+\epsilon)-g(x)} \\[8pt] &= \lim \frac{f(g(x+\epsilon))-f(g(x))}{g(x+\epsilon)-g(x)} \cdot \frac{g(x+\epsilon)-g(x)}{\epsilon} \\[8pt] &= f'(g(x)) \cdot g'(x) \end{align}

Karolis Juodelė
  • 9,702
  • 1
  • 25
  • 39
1

Here is a versions of Harald's answer without the $o$:

A function $g$ has a derivative $g'(x_0)$ at $x_0$ iff there is a trend function $m_g$, continuous at $x_0$, with $m_g(x_0)=g'(x_0)$ and $$g(x)-g(x_0)=m_g(x)\ (x-x_0)$$ for all $x$ in the domain of $g$.

Now let $g$ and $f$ be given, let $m_g$ be as before, and let $m_f$ be the trend function of $f$ at $y_0:=g(x_0)$. Then one has $$\eqalign{f\bigl(g(x)\bigr)-f\bigl(g(x_0)\bigr)&=f\bigl(g(x)\bigr)-f(y_0)\cr &=m_f\bigl((g(x)\bigr)\ (g(x)-y_0)\cr &=m_f\bigl((g(x)\bigr)\ m_g(x)\ (x-x_0)\ .\cr}$$ As $m_{f\circ g}(x):=m_f\bigl((g(x)\bigr)\ m_g(x)$ is continuous at $x_0$ and has the value $f'(y_0)\ g'(x_0)$ there, the claim follows.

1

The idea in Karolis Judelė's answer is the essential fact; it does run into a slight problem, namely, what to do if $g(x+h)-g(x)=0$ for values of $h$ arbitrarily close to $0$?

Morally, this shouldn't matter: if $g(x+h)-g(x)=0$ for arbitrarily small values of $h$, then this should mean that $g'(x)=0$, and then we can "ignore" the points that are giving us trouble and concentrate on the others ones, where the limit will also equal $0$ and not be a problem. But we need to do this formally, which leads to technical complications.

If $g'(a)\neq 0$, then $\lim\limits_{h\to 0}\frac{g(a+h)-g(a)}{h}=g'(a)\neq 0$, so for all sufficiently small values of $h$, different from zero, we have $g(a+h)-g(a)\neq 0$. Even if $g'(a)=0$, it is possible that this difference is never equal to $0$, e.g., $g(x) = x^2$ at $a=0$. So we need to divide the argument into two cases: the "easy" case and the tricky case.

Case 1. If there exists $\delta\gt 0$ such that for all $h$, $0\lt |h|\lt\delta$ implies $g(a+h)-g(a)\neq 0$, then we can proceed as in that answer: $$\begin{align*} \lim_{h\to 0}\frac{f\circ g(a+h) - f\circ g(a)}{h} &= \lim_{h\to 0}\frac{f(g(a+h)) - f(g(a))}{h}\\ &= \lim_{h\to 0}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}\frac{g(a+h)-g(a)}{h} \\&\qquad\qquad\qquad\qquad\qquad\text{(since denominator is not }0\text{)}\\ &=\lim_{h\to 0}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)} \lim_{h\to 0}\frac{g(a+h)-g(a)}{h}\\ &= \lim_{g(a+h)\to g(a)}\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}\lim_{h\to 0}\frac{g(a+h)-g(a)}{h}\\ &\qquad\qquad\qquad\qquad\qquad\text{(because }g\text{ is continuous at }a\text{)}\\ &= f'(g(a))g'(a)\\ &\qquad\qquad\qquad\qquad\qquad\text{(by the definition of derivative)} \end{align*}$$

Case 2. If for every $\delta\gt 0$ there exists $h$ such that $0\lt |h|\lt\delta$ and $g(a+h)=g(a)$.

Note that in order for this to happen, given that $g$ is supposed to be differentiable at $a$, we must have $g'(a)=0$: because the limit $$\lim_{h\to 0}\frac{g(a+h)-g(a)}{h}$$ takes the value $0$ arbitrarily close to $h=0$, and so the limit, if it exists, must be zero. So we must have $g'(a)=0$, which means that we are trying to prove that if we are in case 2, then $(f\circ g)'(a) = 0$.

To that end, we define a new function $\mathcal{K}(h)$ as follows: $$\mathcal{K}(h) = \left\{\begin{array}{ll} f'(g(a)) &\text{if }g(a+h)=g(a)\\ \frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)} &\text{if }g(a+h)\neq g(a). \end{array}\right.$$

I claim that $\lim_{h\to 0}\mathcal{K}(h) = f'(g(a))$. Indeed, since $f$ is differentiable at $g(a)$, for any $\epsilon\gt 0$ there exists $\delta_1\gt 0$ such that $0\lt |b|\lt\delta_1$ implies $$\left|\frac{f(g(a)+b)-f(g(a))}{b} - f'(g(a))\right|\lt\epsilon$$ and since $g$ is continuous at $a$, there exists $\delta_2\gt 0$ such that $0\leq |h|\lt \delta_2$ implies $|g(a+h)-g(a)|\lt \delta_1$. Hence, for all $|h|\lt\delta_2$, we have that $g(a+h)=g(a)$ and $\mathcal{K}(h)$ is equal to $f'(g(a))$, or else $g(a+h)\neq g(a)$ and we have $$ \left|\mathcal{K}(h) - f'(g(a))\right| = \left|\frac{f(g(a+h))-f(g(a))}{g(a+h)-g(a)}-f'(g(a))\right| \lt\epsilon$$ since $|g(a+h)-g(a)|\lt\delta_1$. Thus, $\lim\limits_{h\to 0}\mathcal{K}(h) = f'(g(a))$.

Finally, notice that $$\frac{f(g(a+h))-f(g(a))}{h} = \mathcal{K}(h)\frac{g(a+h)-g(a)}{h}$$ for all $h\neq 0$. Indeed, if $g(a+h)\neq g(a)$, then the right hand side simplifies to the left hand side. And if $g(a+h)=g(a)$, then both sides are equal to $0$.

Now we have: $$\begin{align*} \lim_{h\to 0}\frac{f(g(a+h))-f(g(a))}{h} &= \lim_{h\to 0}\mathcal{K}(h)\frac{g(a+h)-g(a)}{h}\\ &= \lim_{h\to 0}\mathcal{K}(h)\lim_{h\to 0}\frac{g(a+h)-g(a)}{h}\\ &= f'(g(a))g'(a) \end{align*}$$ as desired.

Arturo Magidin
  • 398,050
0

The other proofs are just fine (all the minor issues can be easily fixed), however, there is another way to look at this and to me this way one can gain some more insight why the formula looks like this.

I don't know if you are aware, but there is a non-trivial connection between derivation and combinatorial structures. Let's assume, that the natural number $n$ is equal to a set of $n$ elements, i.e. $n = \{0,\ldots,n-1\}$ meaning $0 = \varnothing$, $1 = \{0\}$, $2 = \{0, 1\}$, etc. Then the usual operators $+$ (addition) and $\times$ (multiplication) gain additional power as disjoint sum and Cartesian product, e.g.

$$A + A = \{0,1\} \times A = 2 \times A.$$ $$A \times (B+C) = A\times B + A\times C$$

In this setting the derivative $\frac{dF[A]}{dA}$ behaves as taking a single element $A$ out of a structure $F[A]$ (the notation $F[A]$ means that the structure $F$ depends on the structure $A$).

For example, let $A$ be some set, then consider the set of all pairs $A \times A \cong A^{\{0,1\}} = A^2$ (the symbol $Y^X$ denotes the set of all functions $X \to Y$, also here $F[Z] =Z^2$). Then, what happens if you remove one $A$ from structure $A^2$? You get $\langle a, \bullet\rangle$ or $\langle a, \bullet\rangle$ (the symbol $\bullet$ is just a placeholder), i.e. $A + A = 2 \times A$. It is easy to generalize this for $n$-tuples, that is $(A^n)' = nA^{n-1}$.

And now, we can interpret $F[G[A]]' = F'[G[A]] \times G'[A]$. To take one $A$ out of $F[G[A]]$ you need first to take single $G[A]$ out of $F[G[A]]$ and then an $A$ out of that $G[A]$.

I hope I hadn't confused you too much. The original thought comes from this article. Good luck!

dtldarek
  • 37,381
0

Here's an informal way to visualize the result.

Consider the space-curve parameterized by $P(t) := (t,g(t),f(g(t))$. Hand-waving differentiability concerns, let $u := (a,b,c)$ --with $a$ and $b$ non-zero-- be a vector tangent to this curve at $P(t_0)$.

Projecting into the $xy$-plane, the curve becomes the graph of $y=g(x)$; and $u$ becomes $(a,b,0)$, which (by more hand-waving) is tangent to the graph at the projection of point $P(t_0)$. Thus, the change-in-$y$-over-change-in-$x$ slope of the graph at that point is $b/a$: $$g^{\prime}(t_0) = \frac{b}{a}$$

Likewise, in the $xz$-plane, the change-in-$z$-over-change-in-$x$ slope of the graph of $z=f(g(x))$ is $c/a$. That is, $$\left(f \circ g\right)^{\prime}(t_0) = \frac{c}{a}$$

Finally, in the $yz$-plane, the change-in-$z$-over-change-in-$y$ slope of the graph of $z=f(y)$ --at $y=g(t_0)$!-- is $c/b$: $$f^{\prime}(g(t_0))= \frac{c}{b}$$

Thus, $$\frac{c}{a} = \frac{b}{a} \cdot \frac{c}{b} \qquad \implies \qquad \left(f \circ g \right)^{\prime} = g^{\prime} \cdot \left( f^{\prime} \circ g \right)$$


This is a generalization of the phenomenon shown in this answer, which uses a helix to illustrate and link the derivatives of sine and cosine via the geometry of tangents to the projected circle. (Oh, hey ... and it appears that I described this generalization in a comment to that answer. Pardon my redundancy.)

Blue
  • 75,673