Question regarding the requirement (determinant) for implicit function theorem

Question

The picture gives part of the definition for Implicit Function Theorem, I know some definition for determinants where there is linear independence between each equation in $\mathbb{R^k}$. However, other than that, I cannot seem to connect why det$D_y$F(a,b) cannot equal to 0 in order for $F(x,y)=0$

could you perhaps clarify/elaborate on what you mean by "I cannot seem to connect why det$D_y$F(a,b) cannot equal to 0 in order for $F(x,y)=0$"? because I think I might understand what you mean but I just want to be $100$ percent sure what your confusion is before answering — peek-a-boo, Jun 23 '19 at 04:22
Yes! I can't understand why $det_F(a,b)$ cannot equal 0 is a requirement for the implicit function theorem to be applied. Is that still too vague? Let me know! — zucvoe, Jun 23 '19 at 04:29

peek-a-boo · Accepted Answer · 2019-06-23T17:33:25.470

What I'll provide is a motivation for why we might impose/how we might come up with the condition $\det (D_yF(a,b)) \neq 0$. For the full explanation of where this fact is used, of course just refer to the proof in your book.

I hope you know that differential calculus (roughly speaking) is the theory of locally approximating by linear functions, because linear things are nice to work with. So the key idea behind things like implicit function theorem/inverse function theorem or really any "big" theorem in differential calculus is to say to yourself

"Right now I have a very general and difficult problem. Can I solve this problem in the special case where everything is nice and linear? Can I then use the insight I gained from the linear case to solve the general case?"

So, in the spirit of this guiding principle, we consider a very special case: let $A \in M_{k \times n}(\Bbb{R})$, and let $B \in M_{k \times k}(\Bbb{R})$, and define the function $G: \Bbb{R}^n \times \Bbb{R}^k \to \Bbb{R}^k$ by \begin{equation} G(x,y) = Ax + By \end{equation} Now the question at hand is: If $G(x,y) = 0$, then can I solve $y$ in terms of $x$? The answer is pretty simple in this case, because if the matrix $B$ is invertible (i.e $\det B \neq 0$) then \begin{equation} G(x,y) = 0 \end{equation} implies that \begin{align} Ax + By = 0, \end{align} and hence \begin{align} y =- (B^{-1}A) x. \end{align}

So, to solve the problem in this special case, we had to make the assumption that $B$ is invertible (i.e $\det B \neq 0$). This is the key insight we gained by solving the special linear case!

This is useful because in general the function $F$ you have been given in the theorem might be very complicated, so you don't know what it really looks like. However, near a point $(a,b)$, where $F(a,b) = 0$, we can use the power of differential calculus to say \begin{align} F(x,y) \approx D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) \quad \text{if $(x,y)$ is near $(a,b)$} \tag{$*$} \end{align} (the approximation being better the closer $(x,y)$ is to $(a,b)$)

Now, the actual question you're being asked is: if $F(x,y) = 0$, then can we solve for $y$ in terms of $x$ (atleast for $(x,y)$ close to $(a,b)$)? This is a difficult problem, but we can use the linear approximation ($*$) to get a rough idea: we have that \begin{align} 0 &= F(x,y) \\ & \approx D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) \end{align} Notice how this is almost like the situation we had above with the function $G$. Here, $A = D_xF(a,b)$ and $B = D_yF(a,b)$. So, if in this general case we impose the condition that $D_yF(a,b)$ is invertible (i.e its determinant is nonzero), then, we get \begin{align} y \approx - (D_yF(a,b))^{-1} \cdot D_xF(a,b) \cdot (x-a) + b \end{align}

Thus, we have used our knowledge of the exact solution in the special linearized case to get a "rough approximate solution" in the general case. Now, all that remains to rigorously prove the theorem is to do some detailed and technical analysis of all the error terms wherever I said $\approx$ above, and to show that even in the general case, we really can solve for $y$ in terms of $x$, provided that $D_yF(a,b)$ is invertible (your book should cover all the detailed arguments).

This is the motivation for why we put $\det D_yF(a,b) \neq 0$ as part of our hypothesis, and it also outlines the thought process of how one might come up with such a requirement. Of course, after coming up with such a requirement, one can come up with examples to show that if this condition is not satisfied, then we cannot solve for $y$ in terms of $x$.

Indeed a simple example to show that the assumption $\det D_yF(a,b) \neq 0$ is needed for the theorem to be true is the following:

let $k=n=1$, define $F: \Bbb{R} \times \Bbb{R} \to \Bbb{R}$ by $F(x,y) = x^2 + y^2 - 1$. Choose $(a,b) = (1,0)$. Then, clearly $F(1,0) = 0$ and $D_yF(1,0) = 0$ (this is a $1 \times 1$ matrix). So, the determinant is also $0$.

Now, notice that the set of $(x,y)$ which satisfy $F(x,y) = 0$ are points on the unit circle in the plane. It should be clear pictorially, that near $(1,0)$, it is impossible to solve for $y$ as a function of $x$.

It was not possible in this case because the determinant was $0$. This shows why the determinant condition is required. (However, notice that $D_xF(1,0) = 2 \neq 0$, so we can solve for $x$ as a function of $y$)

Edit in response to comments:

Recall that in general, by definition, for any function $F: \Bbb{R}^p \to \Bbb{R}^m$, we say $F$ is differentiable at $\alpha$, if there is an $n \times p$ matrix $T$ such that \begin{equation} F(\xi) - F(\alpha) = T(\xi - \alpha) + o(\lVert\xi - \alpha \rVert) \end{equation} If $F$ is differentiable at $\alpha$, then $T$ is unique, and we denote it by the symbol $DF(\alpha)$. i.e we can approximate the change $F(\xi)-F(\alpha)$ by a linear part $DF(\alpha) \cdot (\xi -\alpha)$, and the approximation is valid up to an accuracy of little-oh.

In your particular case, write $p = n+k$, $\xi = \begin{bmatrix} x \\y \end{bmatrix} $, and write $\alpha = (a,b)$. Note that we have the following block matrix decomposition: \begin{align} DF(a,b) = \begin{bmatrix} D_xF(a,b) & D_yF(a,b) \end{bmatrix} \end{align} Hence, we get \begin{align} F(x,y) &= F(a,b) + DF(a,b) \cdot \begin{bmatrix} x-a \\ y-b \end{bmatrix} + o(\lVert (x,y) - (a,b)\rVert) \\ &= F(a,b) + \begin{bmatrix} D_xF(a,b) & D_yF(a,b) \end{bmatrix} \cdot \begin{bmatrix} x-a \\ y-b \end{bmatrix} + o(\lVert (x,y) - (a,b)\rVert) \\ &= F(a,b) + D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) + o(\lVert (x,y) - (a,b)\rVert) \end{align}

This is the proper statement in general, and everything is an equal sign (there are no approximations, because we already took the error term into account with the little-oh notation). In the case of the implicit function theorem, we have $F(a,b) = 0$ by assumption. Hence, we get the statement \begin{equation} F(x,y) = D_xF(a,b) \cdot (x-a) + D_yF(a,b) \cdot (y-b) + o(\lVert (x,y) - (a,b)\rVert) \end{equation}

(In my above explanation, I was too lazy to carry around the little-oh, so I just wrote $\approx$ everywhere instead)

Thanks a lot for the thorough explanation, I understand now. However, I don't quite get why this approximation, (,)≈(,)⋅+(,)⋅ makes sense. I used an example in just one dimension for the domain and range where $f(x) = x^2$, then if we let x = 7, $f'(6.99)⋅7$ would be nowhere close f(7). Am I using a relevant comparison? — zucvoe, Jun 23 '19 at 05:53
you're right, there's a typo there, I'll fix it. But to make the comparison clear, try something with two dimensions in the domain, and one dimension in the target space — peek-a-boo, Jun 23 '19 at 06:02
For instance, try the one at the end, where $F(x,y) = x^2 + y^2 -1$, with $(a,b) = (1,0)$. Then, $F(a,b) = 0$ is clear. What I'm saying is that $F(x,y) \approx D_xF(a,b) \cdot (x-a) + D_yF(a,b) (y-b)$, which in this case simplifies to $F(x,y) \approx 2(x-1) + 0$. So try computing $F(0.9,0.1)$ exactly, and approximately. The exact answer is $-0.18$, the approximate answer is $-0.2$, so you can see they are indeed close. — peek-a-boo, Jun 23 '19 at 06:10
@JohnnyYang If you want to see a proper proof of that fact in the general case, let me know; I'll edit my answer appropriately. — peek-a-boo, Jun 23 '19 at 06:16
Ohh I see, and I tried using the same example you used but instead with $(a,b) = (10, 12)$ and $(x,y) = (9.99, 11.99)$. I found that the closer $x$ and $y$ are to $a$ and $b$, the further the approximation is when I plug in the computed values. Maybe seeing the general proof will help me to visualize how the approximation work better? — zucvoe, Jun 23 '19 at 15:46
@Johnny Yang in this specific case, add $F(a, b) $ to your approximation. Then everything should work nicely. Notice that in my specific choice of $(a, b)$, this term was 0. Sure, I'll edit my answer later — peek-a-boo, Jun 23 '19 at 15:51
@JohnnyYang I have edited the answer to include the justification for the approximation — peek-a-boo, Jun 23 '19 at 17:34

Question regarding the requirement (determinant) for implicit function theorem

1 Answers1

Linked