I am reviewing some basic calculus and am puzzled by introductory texts' justification for the Lagrange Multiplier Method, which is usually an argument about parallel gradients at extrema. But consider this example:
Let $f(x,y) = y^2$ and $g(x,y) = x - y$. Minimize $f$ with respect to the constraint $g(x,y) = 0$.
Now clearly $f(0,0) = 0$ is minimum, but at this point $\nabla f = (0,0), $ whereas $\nabla g = (1, -1)$ and so the justification of the Lagrange Multiplier Method--that the tangent planes are parallel at extrema--fails to hold true here.
I am certainly missing some simple condition required to apply the method. Could anyone set me straight or point me towards a (hopefully self-contained) reference? Thanks!
Note: the above is similar to this question but its accepted answer doesn't address the parallel-ness of gradients.