This question is about a particular strategy, which I think is very appealing from an intuitive viewpoint, for proving the existence of Lagrange multipliers. The proof is a great application of the "four subspaces theorem" from linear algebra, which states that $N(A)^\perp = R(A^T)$. I will describe the strategy, then state the question down below.
First of all, here is the theorem.
Theorem 1 (Lagrange multiplier optimality condition). Let $f:\mathbb R^n \to \mathbb R$ and $g:\mathbb R^n \to \mathbb R^c$ be continuously differentiable functions. (Here $c < n$.) Suppose that $x^* \in \mathbb R^n$ is a local minimizer for the optimization problem \begin{align} \tag{1} \mathop{\text{minimize}}_{x \in \mathbb R^n} & \quad f(x) \\ \text{subject to} & \quad g(x) = 0. \end{align} If the $c \times n$ derivative matrix $g'(x^*)$ has full rank (so its rank is $c$), then there exists a vector $\lambda \in \mathbb R^c$ such that \begin{equation*} \nabla f(x^*) = g'(x^*)^T \lambda. \end{equation*}
The above theorem has a very clear proof in the case where $g$ is affine. In this case, the optimization problem can be written as \begin{align*} \mathop{\text{minimize}}_{x \in \mathbb R^n} & \quad f(x) \\ \text{subject to} & \quad Ax = b \end{align*} where $A$ is a real $c \times n$ matrix and $b \in \mathbb R^c$. We will make use of the "four subspaces theorem" from linear algebra, which tells us that $N(A)^\perp = R(A^T)$. Suppose that $u \in N(A)$, so $Au = 0$. It must be true that the directional derivative $D_u f(x^*)$ satisfies \begin{equation} \tag{2} D_u f(x^*) = \langle \nabla f(x^*), u \rangle = 0, \end{equation} because otherwise $x^*$ would not be a local minimizer. Indeed, if (2) were not satisfied, then we could decrease the value of $f$ without violating the constraint by moving away from $x^*$ a short distance in the direction $u$ or $-u$. Equation (2) tells us that $\nabla f(x^*)$ is orthogonal to $u$. Since $u$ was an arbitrary null vector of $A$, we conclude that $\nabla f(x^*) \in N(A)^\perp = R(A^T)$, which implies that $$ \nabla f(x^*) = A^T \lambda $$ for some vector $\lambda \in \mathbb R^c$. This completes the proof of theorem 1 in the special case where our optimization problem has linear constraints (that is, in the case where $g$ is affine). In this special case, we did not need to assume that $A$ has full rank.
In the general case where $g$ is not affine, it is tempting to replace $g$ with its local linear approximation near $x^*$ $$ \tag{3} g(x) \approx g(x^*) + g'(x^*)(x - x^*) $$ and invoke the above result for the case where the constraints are linear. If this were valid, it would immediately yield the conclusion of theorem 1!
Trying to develop this idea into a rigorous proof, we might reason as follows: Let $u$ be a null vector of $g'(x^*)$, so that $g'(x^*) u = 0$. If it were true that we could move a bit in the direction $u$ or $-u$ without violating the constraint that $g(x) = 0$, then we could conclude (as above) that $D_u f(x^*) = \langle \nabla f(x^*), u \rangle = 0$ (otherwise we could reduce the value of $f$ by moving away from $x^*$ a bit in the direction $u$ or $-u$). The conclusion of theorem 1 would then follow by the same reasoning as above.
Unfortunately, when we move away from $x^*$ a short distance in the direction $u$, the value of $g$ is not quite constant, because the local linear approximation (3) is only valid to first order. So the constraint $g(x) = 0$ is violated, ever so slightly. This is an obstacle to our proof strategy.
Question: How can this proof strategy be developed into a rigorous proof of theorem 1?
I would like to reach the "post-rigorous" stage, where intuition and rigor are merged, of my understanding of theorem 1.
Here is a related question, but I'm posting this question to focus on how to convert the above intuition into a rigorous proof. I'll post an answer below.