How to use chain rule properly/rigorously with functions which doesn't have explicit formulas?

Question

In Kolk's Multidimensional Real Analysis I: Differentiation

He gave a very detailed version of Chain rule at some point $a$ as follows:

Now I want to use Chain rule to prove the following corollary:

Here is my proof:

Step1:

We define $f: \mathbf{R}^n \rightarrow \mathbf{R}^p \times \mathbf{R}^p $ by $f(x)= (f_1(x),f_2(x))$. We know $f$ is differentiable at $a$, while $Df (a)h = (Df_1(a)h, Df_2(a)h), \text{for all} \, a, h \, \in R^n$ We define $g: \mathbf{R}^p \times \mathbf{R}^p \rightarrow \mathbf{R}^p$ by $g(y_1,y_2) = \lambda y_1+y_2$. So we have $g\circ f= \lambda f_1+f_2$.
Step2:

\begin{align*} D(\lambda f_1+f_2)(a) &= D(g\circ f)(a) \\ &=Dg(f_1(a),f_2(a)) \circ Df(a) \text{............(have used Chain rule here)} \\ &=g \circ Df(a) \text{............(because we fortuanately know $g$ is linear, so g is its own derivative everywhere)} \end{align*}

Step3:

We are still not done here since we want to prove two linear functions are equal. So we have to introduce another vector, say $h \in R^n$, to see if they are equal for any $h$. Thus we use the above equation with $h$: $g \circ Df(a) h = g(Df_1(a)h, Df_2(a)h) = \lambda Df_1(a)h + Df_2(a)h = (\lambda Df_1(a) + Df_2(a))(h)$. Proof is done

I have two questions:

Is my proof strictly/rigorously correct?
Is it necessary to introduce this $h$ to prove "two derivatives are equal"?
Can my proof be further simplified? Why the author just say it's obvious, which seems to me quite complicated.

Is this really proved as a corollary of the chain rule in your textbook? It seems simpler to confirm by a short calculation (which is indeed trivial) that the right hand side has the defining property of the left hand side of the equation. — Vercassivelaunos, Jul 29 '21 at 17:12
@Vercassivelaunos Yes, we can prove it directly from the definition: i.e prove (using triangle inequality) that \begin{align} \frac{\bigg|,,,(\lambda f_1+f_2)(a+h) - (\lambda f_1+f_2)(a) - [\lambda (Df_1)_a(h)+(Df_2)_a(h)],,,,\bigg|}{|h|}\to 0 \end{align} as $h\to 0$. (I copied this from peek-a-boo's answer) — Hamilton, Jul 30 '21 at 07:56

score 2 · Accepted Answer · answered Jul 29 '21 at 17:55

Yes, your proof is right (I just checked my copy of Duistermaat and Kolk and indeed you didn't invoke anything not already proven).
Sure, one can introduce the vector $h$ to see that the evaluation of both sides is the same. But to me this seems 'obvious'. Indeed I could reverse the question and ask you why in step 1 you did not verify explicitly (by plugging in an arbitrary $x$) that $g\circ f = \lambda f_1+f_2$ (I'm guessing you didn't do this because it seemed obvious enough because the functions $f$ and $g$ were literally constructed to make this equation true).
I'm not sure if this proof can be simplified, but the authors probably said it is obvious because all you're doing is applying the chain rule to carefully chosen functions.

So, if it was me, I would have just written the following for a proof:

(Define $f,g$ as you have done). Then, $\lambda f_1+f_2=g\circ f$, so \begin{align} D(\lambda f_1+f_2)_a&=D(g\circ f)_a\\ &=Dg_{f(a)}\circ Df_a\tag{chain rule}\\ &=g\circ Df_a\tag{$g$ is linear}\\ &=g\circ ((Df_1)_a,(Df_2)_a)\\ &=\lambda (Df_1)_a+(Df_2)_a \end{align}

So, the only theorems being invoked are the chain rule and that linear maps have derivatives equal to themselves.

Of course, having now proven this theorem as a consequence of the chain rule, you should also prove it directly from the definition: i.e prove (using triangle inequality) that \begin{align} \frac{\bigg\|\,\,\,(\lambda f_1+f_2)(a+h) - (\lambda f_1+f_2)(a) - [\lambda (Df_1)_a(h)+(Df_2)_a(h)]\,\,\,\,\bigg\|}{\|h\|}\to 0 \end{align} as $h\to 0$.

By the way, one can prove things in a different order. Usually, one proves linearity of the derivative directly, and then proves the chain rule. From here, all other facts can be derived, for example, the fact that if $f_1,f_2$ are differentiable at $a$ and $f(x):=(f_1(x)f_2(x))$ then $f$ is also differentiable at $a$ and $Df_a(h)=((Df_1)_a(h),(Df_2)_a(h))$ can be proven as follows:

define $\iota_1:\Bbb{R}^{p_1}\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ and $\iota_2:\Bbb{R}^{p_2}\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ by $\iota_1(x)=(x,0)$ and $\iota_2(y)=(0,y)$. Then given two mappings $f_1:\Bbb{R}^n\to\Bbb{R}^{p_1}$ and $f_2:\Bbb{R}^{n}\to\Bbb{R}^{p_2}$, we define $f:\Bbb{R}^n\to\Bbb{R}^{p_1}\times\Bbb{R}^{p_2}$ as $f(x)=(f_1(x),f_2(x))$.

Then, it is easily verified that $f=\iota_1\circ f_1+\iota_2\circ f_2$, and that $\iota$'s are linear transformations. So, \begin{align} Df_a(h)&=D(\iota_1\circ f_1+\iota_2\circ f_2)_a(h)\\ &=[\iota_1\circ (Df_1)_a](h)+[\iota_2\circ (Df_2)_a](h)\\ &=\bigg((Df_1)_a(h),(Df_2)_a(h)\bigg) \end{align} (of course in the second equal sign, I did many steps at once; I used additivity of derivatives, used the chain rule, and that $\iota_1,\iota_2$ are linear so they are their own derivatives).

The idea of introducing such "auxillary" mappings $\iota$ is very common when you're trying to prove more complicated maps are differentiable (for example one can formulate a very general product rule)

So many thanks for your detailed explanation, especially the proof of things from the other way round! I find using the Chain rule requires so much caution! — Hamilton, Jul 30 '21 at 07:27

How to use chain rule properly/rigorously with functions which doesn't have explicit formulas?

1 Answers1

Linked