Help in understanding a terse proof of an alternate version of Implicit function theorem

Question

Statement: Let $U\subset R^{m+n}$ be an open set, $a\in U$ and $f: \mathbb R^{m+n}\to \mathbb R^n$ be a $C^1$-map such that $f(a) = 0$ and $Df(a)$ is surjective. Then there exists a diffeomorphism $\phi: V\to \phi(V)$ of a neighbourhood $V$ of $0$ onto a neighbourhood of $a$ such that $f\phi(x_1,x_2,\cdots, x_{m+n})=(x_1,x_2,...,x_n)$

The statement is to be proven using implicit function theorem. The proof available with me seems terse to me as I don't understand many steps (highlighted in red below) in it.

Proof: In other words, the conclusion of the theorem is that after a $\color{red}{\text{change of coordinates}}$ in the domain, the function $f$ coincides with the projection map in a small neighborhood of the origin.

By performing a translation, we may assume that $a = 0$. Consider the $n \times (n + m)$ matrix $A$ corresponding to $Df(0)$ which is of rank $n$. By performing column operations we can bring this to the form in which $a_{ij} = \delta_{ij} , 1 ≤ i ≤ n, 1 ≤ j ≤ n + m.$ To sum it up, it follows that, we can first perform $\color{red}{\text{an affine linear change of coordinates}}$ $\mathbb R^{n+m}$ so that, with respect to $\color{red}{\text{the new coordinates}}$, the point $\color{red}{\text{$a$ is the origin}}$ and the given map $f = (f_1, . . . , f_n)$ has the property that $$\frac{\partial f_i}{\partial x_j}(0)=\delta_{ij}, 1\le i\le n, j=1,2,\cdots, m+n.\tag 1$$

Consider the map $h : U\to \mathbb R^{n+m}$ defined by $h := (h_1, . . . , h_{n+m})$ where, $$h_i(x_1, . . . , x_{n+m}) =\begin{cases}f_i(x_1,x_2,\cdots, x_{n+m}), i\le n\\ x_i, i\ge n+1 \end{cases}.\tag 2$$ Then $D(h)(0)$ is invertible and hence by the inverse function theorem, $h$ has an inverse $\phi$, which is continuously differentiable in a small neighborhood of $0$. Writing $(x_1, . . . , x_n) = x, (x_{n+1}, . . . , x_{n+m}) = y$, we have, $\color{red}{(x_1, . . . , x_{n+m}) = (x, y) = h \circ \phi(x, y) = (f \circ φ(x, y), y)}$. Therefore, we have, $f \circ φ(x_1, . . . , x_{n+m}) = (x_1, . . . , x_n)$ near $0$.

I don't understand the red highlighted parts of this proof. Moreover, it feels like memorizing this proof as it doesn't seem intuitive at all. Can anyone please help me with this proof and also suggest me where I can learn this in an intuitive way?

Is there an alternative proof to this terse proof? Thanks.

peek-a-boo · Answer 1 · 2022-09-16T05:32:30.320

The proof given is actually the standard way of showing the inverse function theorem implies the implicit function theorem.

A change of coordinates on the domain means instead of looking at your original $f$, you consider $f\circ \alpha$ for some diffeomorphism $\alpha$. Changing coordinates on the target means considering $\beta\circ f$ instead of $f$, for some diffeomorphism $\beta$. Affine change of coordinates on the domain (resp. target) means $\alpha$ (resp. $\beta$) has to be an affine map.

The idea of the proof is actually very simple; you’re converting a ‘non-square’ system of equations into a ‘square’ system of equations by enlarging the target space (from $\Bbb{R}^n$ to $\Bbb{R}^{n+m}$); this is what’s going on when introducing $h$ in terms of $f$.

Let us ignore non-linearities and suppose for simplicity that you have matrices $A\in M_{n\times n}(\Bbb{R})$, $B\in M_{n\times m}(\Bbb{R})$, and consider the equation $\xi=Ax+By$ with $x\in\Bbb{R}^n,y\in\Bbb{R}^m$, and hence $\xi\in\Bbb{R}^n$. If I told you that the matrix $A$ was invertible, then you’d immediately be able to tell me how to write $x$ in terms of $y$ and $\xi$: \begin{align} x=A^{-1}(\xi-By). \end{align} That’s the obvious way of doing it. But another way of approaching the question is to consider a system of equations: \begin{align} \begin{cases} \xi&=Ax+By\\ \eta&=y, \end{cases} \end{align} so $\xi\in\Bbb{R}^n,\eta\in\Bbb{R}^m$, and the goal is to write $x,y$ in terms of $\xi,\eta$. In block matrix notation, the equation is \begin{align} \begin{pmatrix} \xi\\ \eta \end{pmatrix}&= \begin{pmatrix} A&B\\ 0&I_m \end{pmatrix}\cdot \begin{pmatrix} x\\ y \end{pmatrix}, \end{align} and the solution is \begin{align} \begin{pmatrix} x\\ y \end{pmatrix}&= \begin{pmatrix} A & B\\ 0&I_m \end{pmatrix}^{-1}\cdot \begin{pmatrix} \xi\\ \eta \end{pmatrix} =\begin{pmatrix} A^{-1}&-A^{-1}B\\ 0&I_m \end{pmatrix}\cdot \begin{pmatrix} \xi\\ \eta \end{pmatrix}. \end{align} Hence, by introducing the extra variable $\eta$, we were able to get a square system of equations, which is then easy to invert because of the block structure. This is exactly what’s going on in the proof given (except that they go one step further by assuming the block $A$ is just $I_n$; this is condition (1)). In this presentation, I assumed everything was linear, but the power of the inverse function theorem is that it allows us to carry out this analogous procedure even when we have non-linear function involved.

I would suggest taking a look at Trying to understand the statement of Rudin's Rank Theorem. Also, this way of writing the implicit function theorem is very nice because it reflects the ‘infinitesimal to local’ interplay between derivatives and the actual functions; see Coordinate change to make function linear on a neighborhood for more remarks. Alternatively, if you want to avoid the inverse function theorem then you could try (as a difficult exercise) to prove the implicit function theorem directly using Banach’s fixed point theorem, and the motivating remarks in Question regarding the requirement (determinant) for implicit function theorem.

Thanks a lot for the answer :-). Suppose that $f: U\to V$ is diffeomorphism, where U and V are open subsets of $\mathbb R^m$ and $\mathbb R^n$. I'm little confused by the definition given in a book, which is as follows: "Given a diffeomorphism $f$, we also refer to it as a change of coordinates on U by treating the coordinate functions $f_1, . . . , f_n $ of $f$ as the new coordinates on U." But I don't understand how your definition fits here. Can you please help me understand this? — Koro, Sep 16 '22 at 15:23
@Koro firstly, if $f$ is a diffeomorphism, then necessarily $m=n$. That’s saying the same thing as I did; a change of coordinates just means to consider a diffeomorphism between open subsets of $\Bbb{R}^n$. The purpose of coordinates, is to assign to each point a corresponding $n$-tuple, i.e this is a bijective mapping $\psi:\tilde{U}\subset M\to U\subset\Bbb{R}^n$. If you now have a diffeomorphism $f:U\to V$, then you can compose to get $f\circ \psi:\tilde{U}\to V$. Take a look at the following answer of mine for more about coordinates. — peek-a-boo, Sep 16 '22 at 20:35

Help in understanding a terse proof of an alternate version of Implicit function theorem

1 Answers1