Understanding the proof of the Implicit Mapping Theorem

Question

I am following Advanced Calculus of Several Variables by C.H. Edwards, Jr. I failed to build the logic of the theorem III-$3.4$ stated below,

Theorem $3.4$: Let the mapping $G: \mathscr{R}^{m+n} \rightarrow \mathscr{R}^{n}$ be $\mathscr{C}^{1}$ in a neighborhood of the point $(a,b)$ where $G(a,b)=0$. If the partial derivative matrix $D_{2} G(a, b)$ is nonsingular, then there exists a neighborhood $U$ of $a$ in $\mathscr{R}^{m}$, a neighborhood $W$ of $(a, b)$ in $\mathscr{R}^{m+n}$, and a $\mathscr{C}^{1}$ mapping $h: U \rightarrow \mathscr{R}^{n}$, such that $y=h(x)$ solves the equation $G(x, y)=0$ in $W$.

In particular, the implicity defined mapping $h$ is the limit of the sequence of successive approximations defined inductively by,

$$ \begin{aligned} &\qquad h_{0}(\mathbf{x})=\mathbf{b}, \quad h_{k+1}(\mathbf{x})=h_{k}(\mathbf{x})-D_{2} G(\mathbf{a}, \mathbf{b})^{-1} G\left(\mathbf{x}, h_{k}(\mathbf{x})\right) \end{aligned} $$

for $\mathbf{x} \in U$.

Theorem $3.3$: Suppose that the mapping $f:\mathscr{R}^n\rightarrow\mathscr{R}^n$ is $\mathscr{C}^1$ in a neighborhood $W$ of the point $a$, with the matrix $f'(a)\neq 0$ then $f$ is locally invertible - there exist neighborhoods $U\subset W$ of $a$ and $V$ of $b=f(a)$ and a one-to-one $\mathscr{C}^1$ mapping $g:V\rightarrow W$ such that $$g(f(x))=x\quad\text{for } x \in U,$$ $$f(g(y))=y\quad\text{for } y \in W$$In particular, the local inverse $g$ is the limit of the sequence $\{g_k\}_{k=0}^\infty$ of successive approximations defined inductively by $$g_0(y)=a,\quad g_{k+1}(y)=g_k(y)-f'(a)^{-1}[f(g_k(y))-y]$$

Question $1$:

What I understand, inverse function theorem use implicit function theorem to guarantee there exist a relationship (function) of $y$ in term of $x$ (not explicitly). But the iterative form doesn't make sense to me. Like "Why applying inverse Jocobian $(f'(a)^{-1})$ on $[f(g_k(y))-y]$ we get better and better approximation of $g(y)?$". Because What I know is, "Jacobian approximate $f$ locally by a linear transformation". Then

what information encoded in the $f'(a)^{-1}$ for theorem $3.3$ and $D_{2} G(\mathbf{a}, \mathbf{b})^{-1}$ for theorem $3.4$?

Question $2$:

What's the main difference/motivation/intuition between these two theorems?

Maybe I am asking too many question for a single thread but as they are related to each other and pointed to understand only a single theorem, that's why I am put them all together. It will be great help if anyone explain those question.

Thanks, @emonHR. I remove that question from my the post. Thanks again for that link. Could you say something on remaining question? — falamiw, Dec 07 '21 at 14:38
The inverse and implicit function theorems are equivalent. Each implies the other (and the proof of equivalence is quite straight forward), so there is absolutely no difference between them; it's just a stylistic preference where sometimes one theorem may be more directly/obviously applicable than the other. The main motivation for the theorems is the linear case. I suggest you read this answer of mine for the idea behind the invertibility condition, and where those approximating forms $(h_k)$ come from. — peek-a-boo, Mar 03 '22 at 21:39
These proof rely on the idea of "successive approximations" meaning you first start out with a (not necessarily good) guess for what $y$ should be. Based on that you find to linear order another approximation, and then you keep going. The technical tool which makes all of this work out is Banach's contraction mapping fixed point theorem. As a slightly tangential example, if I asked you to approximate $\sqrt{3}$ to 10 decimal places how would you do it? Well, you guess of course. Guess $y_0=1$; this is too small, then you might guess $y_1=1.7$ next, but this is still too small, so guess again.. — peek-a-boo, Mar 03 '22 at 21:46
Regarding question 1, the "division" by those inverses guarantees that the "slope" is less than 1 and so we have a contraction mapping, so that we can apply Banach's fixed-point theorem. This of course needs to be made rigorous. — Behnam Esmayli, Mar 08 '22 at 06:52

score 2 · Accepted Answer · answered Mar 09 '22 at 07:50

Regarding question $2$,

you can think $\underbrace{y=f(x)}_{G(x,y)=f(x)-y=0}$ of theorem $3.3$ and apply theorem $3.4$ to get the inverse mapping $g$ such that $x=g(y)$ where $D_1G(a,b)$ is definitely $f'(a)$. Eventually, the successive approximations are nothing but the same as $g_{k+1}(y)=g_k(y)-D_1G(a,b)^{-1}G(g_k(k),y)=g_k(y)-f'(a)^{-1}[f(g_k(k))-y]$. I am not explicitly mentioning the neighborhoods, as you seemed to understand that.

And I am pretty sure that the book used theorem $3.3$ to prove theorem $3.4$ which complete the equivalency of those two theorems.

Now, come to the question $1$. I can't imagine a better answer than @ peek-a-boo wrote here. Let me quote the main idea of that according to your question context,

Near the point $(a,b)$, where $G(a,b) = 0$, we can use the power of differential calculus to say \begin{align}0 &= G(x,y) \\ &\approx \underbrace{G(a,b)}_{0}+D_1G(a,b) \cdot (x-a) + D_2G(a,b) \cdot (y-b) \quad \text{if $(x,y)$ is near $(a,b)$} \tag{$*$} \end{align} The approximation being better the closer $(x,y)$ is to $(a,b)$. So, if in this general case we impose the condition that $D_2G(a,b)$ is invertible (i.e its determinant is nonzero), then, we get \begin{align} y \approx - (D_2G(a,b))^{-1} \cdot D_1G(a,b) \cdot (x-a) + b \end{align}

If you understand the equivalency, then I guess you won't face any issue to interpret $f'(a)^{-1}$ also.

Now, you might be asked why everyone uses linear version to answer that question. Why not use local quadratic approximation, $$ G(x, y) \approx G\left(x_{0}, y_{0}\right)+DG\left(a,b\right) \cdot\left[\begin{array}{l} x-a \\ y-b \end{array}\right]+\frac{1}{2}\left[\begin{array}{ll} x-a \quad y-b \end{array}\right] H_{G}\left(a, b\right)\left[\begin{array}{l} x-a \\ y-b \end{array}\right] $$ Where $H_G$ is the Hessian matrix of $G$. We actually don't want to get the $h$ or $g$ function in a single shot. We leverage the computation to the successive approximation scheme. There is no rule to use linearization. The more higher version used to get those function, the more computation you needed for single shot which is not computational efficient.

Understanding the proof of the Implicit Mapping Theorem

Question $1$:

Question $2$:

1 Answers1

Linked