I am following Advanced Calculus of Several Variables by C.H. Edwards, Jr. I failed to build the logic of the theorem III-$3.4$ stated below,
Theorem $3.4$: Let the mapping $G: \mathscr{R}^{m+n} \rightarrow \mathscr{R}^{n}$ be $\mathscr{C}^{1}$ in a neighborhood of the point $(a,b)$ where $G(a,b)=0$. If the partial derivative matrix $D_{2} G(a, b)$ is nonsingular, then there exists a neighborhood $U$ of $a$ in $\mathscr{R}^{m}$, a neighborhood $W$ of $(a, b)$ in $\mathscr{R}^{m+n}$, and a $\mathscr{C}^{1}$ mapping $h: U \rightarrow \mathscr{R}^{n}$, such that $y=h(x)$ solves the equation $G(x, y)=0$ in $W$.
In particular, the implicity defined mapping $h$ is the limit of the sequence of successive approximations defined inductively by,
$$ \begin{aligned} &\qquad h_{0}(\mathbf{x})=\mathbf{b}, \quad h_{k+1}(\mathbf{x})=h_{k}(\mathbf{x})-D_{2} G(\mathbf{a}, \mathbf{b})^{-1} G\left(\mathbf{x}, h_{k}(\mathbf{x})\right) \end{aligned} $$
for $\mathbf{x} \in U$.
Theorem $3.3$: Suppose that the mapping $f:\mathscr{R}^n\rightarrow\mathscr{R}^n$ is $\mathscr{C}^1$ in a neighborhood $W$ of the point $a$, with the matrix $f'(a)\neq 0$ then $f$ is locally invertible - there exist neighborhoods $U\subset W$ of $a$ and $V$ of $b=f(a)$ and a one-to-one $\mathscr{C}^1$ mapping $g:V\rightarrow W$ such that $$g(f(x))=x\quad\text{for } x \in U,$$ $$f(g(y))=y\quad\text{for } y \in W$$In particular, the local inverse $g$ is the limit of the sequence $\{g_k\}_{k=0}^\infty$ of successive approximations defined inductively by $$g_0(y)=a,\quad g_{k+1}(y)=g_k(y)-f'(a)^{-1}[f(g_k(y))-y]$$
Question $1$:
What I understand, inverse function theorem use implicit function theorem to guarantee there exist a relationship (function) of $y$ in term of $x$ (not explicitly). But the iterative form doesn't make sense to me. Like "Why applying inverse Jocobian $(f'(a)^{-1})$ on $[f(g_k(y))-y]$ we get better and better approximation of $g(y)?$". Because What I know is, "Jacobian approximate $f$ locally by a linear transformation". Then
what information encoded in the $f'(a)^{-1}$ for theorem $3.3$ and $D_{2} G(\mathbf{a}, \mathbf{b})^{-1}$ for theorem $3.4$?
Question $2$:
What's the main difference/motivation/intuition between these two theorems?
Maybe I am asking too many question for a single thread but as they are related to each other and pointed to understand only a single theorem, that's why I am put them all together. It will be great help if anyone explain those question.