5

I have some questions about the following theorem and it's proof.

Theorem. Let $X\subset \mathbb R^n$, open, convex, bounded and $f_n:X \to \mathbb R^m$ differentiable.And let's consider the following:

  1. There exist $H$: $\mathbb R^n\rightarrow L(\mathbb R^n,\mathbb R^m)$ such that $Df_n \rightarrow H$ uniformly in X, where $Df_n$ is the differential of $f_n$.
  2. There exist $x_0 \in $ X such that the sequence $\{f_n(x_0)\}$ converges.

Then there exist $f:X \rightarrow \mathbb R^m$ such that $f_n \rightarrow f$ uniformly over X, $f$ is differentiable and $Df(x)=H(x)$, for all $x\in X$.

Proof. Notice that \begin{align*} & \frac{f(x)-f(x_{0})-H(x_{0})\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}% \color{fuchsia}=\frac{f(x)-f(x_{0})-[f_{n}(x)-f_{n}(x_{0})]}{\Vert x-x_{0\Vert}}\\ & +\frac{f_{n}(x)-f_{n}(x_{0})-\nabla f_{n}(x_{0})\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}+\frac{(\nabla f_{n}(x_{0})-H(x_{0}))\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}\\ & =:I+II+III. \end{align*} Since $X$ is convex, by applying the mean value theorem to the function $$ g_{n,m}(t)=f_{m}(tx+(1-t)x_{0})-f_{n}(tx+(1-t)x_{0}),\quad t\in\lbrack0,1] $$ $\color{fuchsia}{**...(2)**}$

there is $t_{0}$ such that \begin{align*} & f_{m}(x)-f_{m}(x_{0})-[f_{n}(x)-f_{n}(x_{0})]=g_{n,m}(1)-g_{n,m}(0)\\ & =g_{n,m}^{\prime}(t_{0})=(\nabla f_{m}(z_{0})-\nabla f_{n}(z_{0}% ))\cdot(x-x_{0}), \end{align*} where $z_{0}=t_{0}x+(1-t_{0})x_{0}$. By uniform convergence of the gradients, ...(1) $$ \Vert\nabla f_{m}(z)-\nabla f_{n}(z)\Vert\leq\Vert\nabla f_{m}(z)-H(z)\Vert +\Vert\nabla f_{n}(z)-H(z)\Vert\leq2\varepsilon $$ for all $n,m\geq n_{\varepsilon}$ and all $\color{fuchsia}z\in X$. Hence, by Cauchy's inequality \begin{align*} \left\vert \frac{f_{m}(x)-f_{m}(x_{0})-[f_{n}(x)-f_{n}(x_{0})]}{\Vert x-x_{0\Vert}}\right\vert & =\left\vert \frac{(\nabla f_{m}(z_{0})-\nabla f_{n}(z_{0}))\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}\right\vert \\ & \leq\Vert\nabla f_{m}(z_{0})-\nabla f_{n}(z_{0})\Vert\leq2\varepsilon. \end{align*} Since $X$ is bounded, this inequality implies that \begin{align*} \vert f_{m}(x)-f_{n}(x)\vert\le|f_{m}(x_{0})-f_{n}(x_{0})|+\Vert x-x_{0}\Vert 2\varepsilon\le |f_{m}(x_{0})-f_{n}(x_{0})|+2M\varepsilon. \end{align*} and so $\{f_n\}$ is a uniform Cauchy sequence ...(2) and so it converges uniformly to a function $f$. Letting $m\rightarrow\infty$ $\color{fuchsia}{**...(3)**}$ we get $$ \left\vert \frac{f(x)-f(x_{0})-[f_{n}(x)-f_{n}(x_{0})]}{\Vert x-x_{0\Vert}% }\right\vert \leq2\varepsilon $$ for all $n \geq n_{\varepsilon}$. This takes care of $I$. Taking $n =n_{\varepsilon}\color{fuchsia}{**...(4)**} $ and using the fact that $f_{n_{\varepsilon}}$ is differentiable at $x_{0}$ we get that $$ \left\vert \frac{f_{n_{\varepsilon}}(x)-f_{n_{\varepsilon}}(x_{0})-\nabla f_{n_{\varepsilon}}(x_{0})\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}\right\vert \leq\varepsilon $$ for all $x\in X$ with $0<\Vert x-x_{0}\Vert\leq\delta_{\varepsilon}$. This takes care of $II$.

Lastly, by Cauchy's inequality $$ \left\vert \frac{(\nabla f_{n}(x_{0})-H(x_{0}))\cdot(x-x_{0})}{\Vert x-x_{0\Vert}}\right\vert \leq\Vert\nabla f_{n}(x_{0})-H(x_{0})\Vert \leq\varepsilon $$ for all $n\geq n_{\varepsilon}$. In conclusion we have that for all $x\in A$ with $0<\Vert x-x_{0}\Vert\leq\delta_{\varepsilon}$, $$ \left\vert \frac{f(x)-f(x_{0})-H(x_{0})\cdot(x-x_{0})}{\Vert x-x_{0\Vert}% }\right\vert \leq4\varepsilon $$ which implies that $f$ is differentiable at $x_{0}$ with $\nabla f(x_{0})=H(x_{0})$.$\color{fuchsia}{**...(5)**}$ By repeating the proof with $x_0$ replaced by any other point, we get that $f$ is differentiable in $X$. ...(4)

My questions are in $\color{fuchsia}{pink}$ color.

  1. At the beginning why it is that $H(x_0) (x-x_0)= f_{n}(x)-f_{n}(x_{0})$?
  2. How is $g_{m,n}$ defined? I think its domain is $[0,1]$ but I don't know which codomain has.
  3. What is the form of $z$, is it a simple vector in X, or has the form of $z_0$?
  4. Why do we take $m\to\infty$? Could have been $n$? And why taking $m\to\infty$ implies the next inequality?
  5. Why do we take the $n=n_{\epsilon}$? I really don't see this step.
  6. Why do we need to repeat the proof?? Isn't $x_0$ arbitrary?
  7. How can the proof be formal? At the beginning, it is stated that it must be always $\varepsilon>0$, and maybe the $x,x_0\in X$ must be given too? Or they must be given in the middle of the proof? Or where should they be? Or how?
seldon
  • 1,382

1 Answers1

2

I tried to understand exactly which arguments are unclear to you. Let me know if I didn't cover something. $$ \newcommand{\R}{\Bbb{R}} $$

  1. It's not that $H(x_0)(x-x_0) = f_n(x)-f_n(x_0)$ (it is only true when passing to the limit $n \to \infty$), this step is a common algebraic trick (adding and substracting the same thing): $$ \frac{f(x)-f(x_0)-H(x_0)(x-x_0)}{\| x-x_0 \|} + \frac{f_n(x)-f_n(x_0) + \nabla f_n(x_0)(x-x_0)}{\|x-x_0\|} - \frac{f_n(x)-f_n(x_0) + \nabla f_n(x_0)(x-x_0)}{\|x-x_0\|} $$ Rearrange the terms conveniently and you get the above result.
  2. This is another common trick in multivariate calculus: by hypothesis, $X$ is a convex domain, i.e. contains (by definition) all the points of the segment $xy$ whenever $x, y \in X$. In this specific case, the points are $x$ and $x_0$. We now parametrize the segment between $x$ and $x_0$ in the following way: $$ \phi(t) = tx + (1-t)x_0, \quad t \in [0,1] $$ You can easily check this is actually such a parametrization. Then the proof goes on and defines $$ g_{n,m}(t) = f_m(tx+(1-t)x_0) - f_n(tx+(1-t)x_0), \quad t \in [0,1] $$ Which is a double sequence of functions. Each function has [0,1] as domain, and codomain in $\R^n$: this is because the $f_n$ themselves are defined to take values in a unspecified subset of $\R^n$, which by the way isn't actually relevant to the proof. Now notice that $$ f_m(x) - f_m(x_0) - [f_n(x)-f_n(x_0)] = f_m(x) - f_n(x) - (f_m(x_0) - f_n(x_0)) = g_{n,m}(1) - g_{n,m}(0) $$ We then apply the MVT to each of the $g_{n,m}$: it exists a $t_0 \in [0,1]$ such that $$ \frac{g_{n,m}(1) - g_{n,m}(0)}{1-0} = g_{n,m}(1) - g_{n,m}(0) = g'_{n,m}(t_0) $$ The $t$s where just parameters for the segment $xx_0$, thus to $t_0$ corresponds a point on that segment which we label $z_0$. If we compute $g'_{n,m}$ we get (using the chain rule) $$ g'_{n,m}(t) = (\nabla f_m(tx+(1-t)x_0) - \nabla f_n(tx+(1-t)x_0))(x-x_0) $$ That's why $g'_{n,m}(t_0) = (\nabla f_m(x) - \nabla f_n(x))(x-x_0)$.
  3. $z$ is taken in $X$, thus it is a vector. However it's arbitrary, while $z_0$ is not (it must have the property required by the MVT).
  4. The inequality here is just the same inequality before, but at the limit for $m \to \infty$. It is a property of the limit to preserve (weak) inequalities. We make $m$ go to infinity, and not $n$, because $m$ was actually an auxiliar index: we introduced it to use the uniform convergence of the gradients and prove the first inequality. Instead, $n$ is the index of the sequence $\{f_n\}_{n\in\Bbb{N}}$, thus we cannot dispense of it.
  5. Taking $n = n_\varepsilon$ it's just a way to guarantee that each estimate is cumulative on the previous: it could be, for example, that the second estimate is valid from $n_1 < n_\varepsilon$, thus between $n_1$ and $n_\varepsilon$ we lose the first estimate. We don't want this to happen. It's just a manner to say "for all sufficiently large values of $n$".
  6. I believe this to be a mistake. $x_0$ is fixed by the hypothesis (it's a point where the sequence $\{f_n(x_0)\}_{n\in\Bbb{N}}$ converges. It is $x$ which is arbitrary. By arbitrariety, it follows the proof is valid for any "choice" of $x \in X$, that is you could repeat the proof for any $x \in X$ and it will still be valid.
  7. The proof is sufficiently formal, in my opinion. See the previous point for a clarification on the role of $x$ and $x_0$ in the proof. The condition (which is implicit) $\varepsilon > 0$ descends from the standard $(\varepsilon, \delta)$-definition of limit.
seldon
  • 1,382