3

Suppose $E\subseteq\mathbb R^n$ and $f$ maps $E$ into $\mathbb R^m$. Let $g$ map a subset of $\mathbb R^m$ into $\mathbb R^p$. If $f$ is differentiable at $x\in E$ and $g$ is differentiable at $f(x) \in f(E)$, then the composition $g \circ f$ is differentiable at $x$ and $$(g\circ f)'(x) = g'(f(x)) f'(x).$$

where the indicated product is matrix multiplication.

Although this version of the chain rule may look a bit strange, it is really just the familiar chain rule of calculus in a new guise. You can convince yourself of this fact by writing the formula out in terms of partial derivatives.

This is something I don't understand. I don't find this definition of chain rule intuitive.Can someone explain this?

  • 1
    See Intuitive Proof of the Multivariable Chain Rule. The chain rule written abstractly in terms of compositions of linear maps ($D(g\circ f)x=Dg{f(x)}\circ Df_x$) is, I would say, most natural thing. The adhoc misfit is the formula involving partial derivatives – peek-a-boo Sep 03 '21 at 11:03
  • @peek-a-boo your answer there is excellent. In that answer you mention it is tough to prove that the “stuff” involving the error corrections $\gamma,\phi$ go to zero faster than $|h|$, but doesn’t that immediately follow by $D$ being linear and by definition of $\gamma,\phi$? – FShrike Sep 03 '21 at 11:24
  • @FShrike I mean "tough" is a subjective term. Introductory books may not provide all the details, as when first learning it can be tough (mainly because people aren't used to rigorous estimates). To me, now, it's a trivial estimate (because I've done much more complicated analysis, so that little chain rule proof really is trivial in comparison). What we need is not linearity of $D$, but rather that $Dg_{f(a)}$ and $Df_a$ are continuous (often called bounded) linear maps, which means they satisfy a Lipschitz condition. From here, the decay of $\gamma$ and $\phi$ completes the argument. – peek-a-boo Sep 03 '21 at 11:29

0 Answers0