General Info
A the OP powers pointed
out, a lot of relevant examples and exposition is covered in these
MIT Open Courseware notes on non-independent variables. However, I'll try to tie things together, use different notation that is more common outside of physics, and confirm/clarify things relevant
to the question at hand.
Main Answer
Ignoring edge cases, this reciprocal rule $\dfrac{\partial y}{\partial x}*\dfrac{\partial x}{\partial y}=1$
holds automatically when $x$ and $y$ are related by just a
single equation. And it also holds when the implication is that
we hold the same variables constant, and usually fails
otherwise.
The problem is that, in many mathematical contexts (at least outside
of physics), these conditions don't often come up naturally, so that
it would be reasonable for people to have the rule of thumb "it
never holds".
Suppose $x$ and $y$ are related by a single equation, but there
is at least one more variable around (so that partial derivative notation
would naturally arise). If we have something like $x=f\left(y,t\right)$
then we might write $\dfrac{\partial x}{\partial y}$ but have no
need to write $\dfrac{\partial y}{\partial x}$. And if we have something
like $F\left(x,y,t,u\right)=0$, then we either would only write things
like $\dfrac{\partial F}{\partial y}$, or perhaps solve for $x$
or $y$ and again only deal with one of $\dfrac{\partial x}{\partial y}$
and $\dfrac{\partial y}{\partial x}$.
A typical more-complicated situation might be something like $u=f\left(x,y\right)$
and $v=g\left(x,y\right)$. Then we would typically think about $u$
and $v$ as a pair, and $x$ and $y$ as a pair. As such, $\dfrac{\partial u}{\partial x}$
would imply keeping $y$ constant. And $\dfrac{\partial x}{\partial u}$
might come up, but it would typically imply keeping $v$ constant,
rather than $y$. This subtlety and related ideas are clarified throughout
this answer.
Notation
There are lots of different notations we can use to discuss the quantities
at hand. Since the properties of something like $\dfrac{\partial y}{\partial x}$
depends on the context in an often-unwritten way, and it can be a
challenge to carefully distinguish between the uses of the variable
letters, I would like to focus primarily on notations that provide
more clarity and/or avoid issues.
If we have a function whose application might be written as $f(x,y,z)$,
then the first/second/third inputs do not depend on the choice of
variable names. This means we can refer to the partial derivatives
of $f$ in an unambiguous way by referring to positions of inputs.
For this post, I will use $\partial_{i}f$ to refer to the partial
derivative with respect to the $i^{\text{th}}$ input. For example,
$\partial_{2}f(a,b,c)={\displaystyle \lim_{y\to b}}\dfrac{f(a,y,c)-f(a,b,c)}{y-b}$;
note that all inputs except for the second are held constant when
taking that limit.
When reverting back to a Leibniz notation style, I will often use
a vertical bar and name the input point for clarity. For example,
if we have $w=f(x,y,z)$, then $\partial_{2}f(a,b,c)=\left.\dfrac{\partial f}{\partial y}\right|_{\left(a,b,c\right)}$,
which may be written as $\left.\dfrac{\partial w}{\partial y}\right|_{\left(a,b,c\right)}$
when that would not cause confusion.
Local Inverses
I do not want to dwell on the full complexities of things like the
inverse and
implicit
function theorems in this answer. But to make sense of things, the
concept of solving an equation (or a system) "locally" will be
important in the background.
For example, suppose we have the parabolic curve $y=x^{2}$. Then
there is no function $f(y)$ so that $x=f(y)$ covers all the points
on the curve. However, if we examine things near the point $(-2,4)$
(imagine a small disk centered at that point), then $x=-\sqrt{y}$
works for the piece of the curve that's nearby that point (even though
it stops working far from the point where $x>0$). But no matter how
close we look to $(0,0)$, no $x=f(y)$ would cover the piece of that
curve in the disk.
This sort of idea generalizes to more variables and more simultaneous
equations: sometimes we can write a variable as a function of the
others to cover everything near some point, even if it doesn't work
far away from that point.
One Equation Works
The number of extra variables doesn't matter here, so I will use two.
Suppose that we are looking at a volume of 4D space given by an equation
in $x,y,z,w$. By moving everything to one side, we can write this
as $H\left(x,y,z,w\right)=0$. Then, assume that we can solve locally
to have $x$ and $y$ as functions of the other variables. Let's say
$y=f(x,z,w)$ and $x=g(y,z,w)$ hold on the volume near a point $\left(a,b,c,d\right)$
satisfying $H(a,b,c,d)=0$. That is, there is a region $R$ of 4D
space surrounding the point $(a,b,c,d)$ where we have $H\left(x,f(x,z,w),z,w\right)=0$
(and no other $y$-coordinates work in $R$) and $H\left(g(y,z,w),y,z,w\right)=0$
(and no other $x$-coordinates work in $R$).
Given that we have $y$ in terms of $x$ and $x$ in terms of $y$,
we can interpret expressions like $\dfrac{\partial y}{\partial x}$
and $\dfrac{\partial x}{\partial y}$ as partial derivatives of $f$
and $g$. Specifically, $\dfrac{\partial y}{\partial x}$ could mean
$\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}=\partial_{1}f(a,c,d)$
and $\dfrac{\partial x}{\partial y}$ could mean $\left.\dfrac{\partial g}{\partial x}\right|_{\left(b,c,d\right)}=\partial_{1}g(b,c,d)$.
These quantities are indeed inverses, as can be seen in a couple different
ways.
Restrictions
Note that $\partial_{1}f(a,c,d)$ and $\partial_{1}g(b,c,d)$ both
involve limits that leave the second and third inputs constant, and
they're both evaluated with the second and third inputs equal to $c$
and $d$, respectively. This means that everything of interest happens
in the portion of $R$ consisting of points of the form $\left(x,y,c,d\right)$.
For inputs consistent with $R$, define $\hat{f}(t)=f(t,c,d)$
and $\hat{g}(t)=g(t,c,d)$, so that $y=\hat{f}(x)$ and
$x=\hat{g}(y)$ on this portion of $R$ (meaning that $\hat{f}$
and $\hat{g}$ are inverses). Then we know from single-variable
calculus that $\hat{f}'(a)=1/\hat{g}'\left(\hat{f}(a)\right)$,
or $\hat{f}'(a)\hat{g}'\left(b\right)=1$. But these are
precisely $\partial_{1}f(a,c,d)$ and $\partial_{1}g(b,c,d)$, so
$\dfrac{\partial y}{\partial x}*\dfrac{\partial x}{\partial y}=1$,
as desired.
Differentials
Another way to look at things is to use differentials. We have $\mathrm{d}y=\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}\mathrm{d}x+\left.\dfrac{\partial f}{\partial z}\right|_{\left(a,c,d\right)}\mathrm{d}z+\left.\dfrac{\partial f}{\partial w}\right|_{\left(a,c,d\right)}\mathrm{d}w$.
And similarly, $\mathrm{d}x=\left.\dfrac{\partial g}{\partial y}\right|_{\left(b,c,d\right)}\mathrm{d}y+\left.\dfrac{\partial g}{\partial z}\right|_{\left(b,c,d\right)}\mathrm{d}z+\left.\dfrac{\partial g}{\partial w}\right|_{\left(b,c,d\right)}\mathrm{d}w$.
In particular, if $z$ and $w$ are not allowed to vary from $c$
and $d$, then $\mathrm{d}z$ and $\mathrm{d}w$ are $0$, and we
have $\mathrm{d}y=\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}\mathrm{d}x$
and $\mathrm{d}x=\left.\dfrac{\partial g}{\partial y}\right|_{\left(b,c,d\right)}\mathrm{d}y$.
These force $\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}*\left.\dfrac{\partial g}{\partial y}\right|_{\left(b,c,d\right)}=1$
as $x$ and $y$ vary.
We could instead differentiate both sides of $H(x,y,c,d)=0$ with respect to $x$, obtaining $\partial_1H(a,b,c,d)+\partial_2H(a,b,c,d)\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}=0$ by the chain rule/implicit differentiation. Similarly, $\partial_1H(a,b,c,d)\left.\dfrac{\partial g}{\partial y}\right|_{\left(b,c,d\right)}+\partial_2H(a,b,c,d)=0$. These yield $\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}=-\partial_1H(a,b,c,d)/\partial_2H(a,b,c,d)$ and $\left.\dfrac{\partial g}{\partial y}\right|_{\left(b,c,d\right)}=-\partial_2H(a,b,c,d)/\partial_1H(a,b,c,d)=1/\left(\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,c,d\right)}\right)$, as expected.
Same Variables Constant
Suppose we have a bunch of equations relating a bunch of variables.
For example, maybe we are interested in the volume in 5D space where
$F\left(x,y,z,w,t\right)=0$ and $G\left(x,y,z,w,t\right)=0$ both
hold. Now it's less clear what an expression like $\dfrac{\partial y}{\partial x}$
should mean. For example, maybe both the $F$ equation and the $G$
equation can be solved locally for $y$ as a function of the other
variables, so that we could take the partial of either function and
get answers that look radically different.
How It Works
Being more clever, suppose that $F\left(x,y,z,w,t\right)=0$ can be solved locally
near $(a,b,c,d,e)$ for $t=f(x,y,z,w)$. And also suppose that $G\left(x,y,z,w,f(x,y,z,w)\right)=0$
can be solved locally for $y=g(x,z,w)$. Then we could take $\partial_{1}g(a,c,d)$
and call that $\dfrac{\partial y}{\partial x}$. Note that in the
limit for $\partial_{1}g(a,c,d)$, we're holding $z$ and $w$ constant
at $c$ and $d$, so in hindsight we could have done that from the
beginning of this calculation. And then we could likely solve $G\left(x,y,z,w,f(x,y,z,w)\right)=0$
locally for $x=h(y,z,w)$ to find a value like $\partial_{1}h\left(b,c,d\right)$
worth calling $\dfrac{\partial x}{\partial y}$. This calculation
with the single equation $G\left(x,y,z,w,f(x,y,z,w)\right)=0$ is just like the calculation
we did with $H\left(x,y,z,w\right)$ earlier, so we have $\partial_{1}g(a,c,d)*\partial_{1}h\left(b,c,d\right)=1$.
Renaming/reordering $z,w,t$ wouldn't change this idea.
What Doesn't Work
However, there are other similar calculations that don't lead to the
same result. For instance, what if we instead had solved $F\left(x,y,z,w,t\right)=0$
for $z=j(x,y,w,t)$, and then solved $G\left(x,y,j(x,y,w,t),w,t\right)=0$
for $x=k(y,w,t)$ to find $\partial_{1}k(b,d,e)$ as our $\dfrac{\partial x}{\partial y}$
instead of $\partial_{1}h\left(b,c,d\right)$. We have no reason
to believe that $\partial_{1}g(a,c,d)*\partial_{1}k(b,d,e)=1$, since
different variables are being held constant in the calculations for
the two factors. In other words, $\partial_{1}k(b,d,e)$ need not
equal $\partial_{1}h\left(b,c,d\right)$.
The MIT OCW notes mentioned earlier discuss a simple example
of this failing that I will summarize here. Consider the surface given
by $w=x^{2}+y^{2}+z^{2}$ and $z=x^{2}+y^{2}$. Then one interpretation
of $\dfrac{\partial w}{\partial x}$ is obtained by substituting in
the second equation into the first, so that we have $w=x^{2}+y^{2}+\left(x^{2}+y^{2}\right)^{2}$
and then differentiating to get $2x+4x^{3}+4xy^{2}$, which is rarely
$0$. But we could also solve the second equation for $y^{2}$ and
write $w=x^{2}+z-x^{2}+z^{2}=z+z^{2}$ so that $\dfrac{\partial w}{\partial x}$
would always be $0$. The first method gives " $\dfrac{\partial w}{\partial x}$
if you hold $y$ constant". The second method gives "$\dfrac{\partial w}{\partial x}$
if you hold $z$ constant" (so you are moving around a circular
level curve of $z=x^{2}+y^{2}$ and the distance to the origin $\sqrt{w}$
is not changing). In our notation style, this is saying that when
the two equations are satisfied near a point $\left(a,b,c,d\right)$,
we have that $w=f(x,y):=x^{2}+y^{2}+\left(x^{2}+y^{2}\right)^{2}$
and $w=g(x,z):=z+z^{2}$ and we are noting that (most of the time)
$\partial_{1}f(a,b)\ne\partial_{1}g(a,c)$, even though both have
a reason to be called "$\dfrac{\partial w}{\partial x}$".
From this perspective, the notation $\dfrac{\partial w}{\partial x}$
is ambiguous when there are multiple equations, and it can be worth
introducing extra symbols to the notation to keep track of what is
being held constant. A convention used in the MIT OCW notes, and often
in physics, is to write subscripts for the variables being held constant,
so that $\partial_{1}f(a,b)=\left(\dfrac{\partial w}{\partial x}\right)_{y}$
and $\partial_{1}g(a,c)=\left(\dfrac{\partial w}{\partial x}\right)_{z}$.
In this notation, the previous calculations with $F$ anf $G$ were saying that $\left(\dfrac{\partial y}{\partial x}\right)_{z,w}*\left(\dfrac{\partial x}{\partial y}\right)_{z,w}=1$,
but that it was likely that $\left(\dfrac{\partial x}{\partial y}\right)_{w,t}\ne\left(\dfrac{\partial x}{\partial y}\right)_{z,w}$
so $\left(\dfrac{\partial y}{\partial x}\right)_{z,w}*\left(\dfrac{\partial x}{\partial y}\right)_{w,t}\ne1$.
Two Variable Transformation
A common situation in multivariable calculus is a change of variables
for 2D space. Suppose that we are looking at a region of 4D space
consisting of points $\left(x,y,u,v\right)$ where $u=f\left(x,y\right)$
and $v=g\left(x,y\right)$, and that you could (locally) solve for
$x$ in terms of $u$ and $v$ to get $x=h\left(u,v\right)$. For
example, this sort of thing happens with $u=x\cos y,v=x\sin y$ so
that $x=\sqrt{u^{2}+v^{2}}$ near some point where $x>0$. We can
try to see if we can get something like $\dfrac{\partial y}{\partial x}*\dfrac{\partial x}{\partial y}=1$
in this context.
Differential Calculation
Near a point $\left(x,y,u,v\right)=\left(a,b,c,d\right)=\left(a,b,f(a,b),g(a,b)\right)$,
we have $\mathrm{d}u=\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,b\right)}\mathrm{d}x+\left.\dfrac{\partial f}{\partial y}\right|_{\left(a,b\right)}\mathrm{d}y$.
And with $x=h\left(u,v\right)$, we have $\mathrm{d}x=\left.\dfrac{\partial h}{\partial u}\right|_{\left(c,d\right)}\mathrm{d}u+\left.\dfrac{\partial h}{\partial v}\right|_{\left(c,d\right)}\mathrm{d}v$.
Certainly, if $y$ and $v$ don't change so that $\mathrm{d}y,\mathrm{d}v=0$,
then $\mathrm{d}u=\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,b\right)}\mathrm{d}x$
and $\mathrm{d}x=\left.\dfrac{\partial h}{\partial u}\right|_{\left(c,d\right)}\mathrm{d}u$.
so that $\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,b\right)}*\left.\dfrac{\partial h}{\partial u}\right|_{\left(c,d\right)}=1$,
which is analogous to $\dfrac{\partial u}{\partial x}*\dfrac{\partial x}{\partial u}=1$.
However, since $\mathrm{d}v=\left.\dfrac{\partial g}{\partial x}\right|_{\left(a,b\right)}\mathrm{d}x+\left.\dfrac{\partial g}{\partial y}\right|_{\left(a,b\right)}\mathrm{d}y$,
if $\mathrm{d}v$ and $\mathrm{d}y$ are simultaneously $0$ when
$\mathrm{d}x$ is not, that means we must be at a special point where
$\left.\dfrac{\partial g}{\partial x}\right|_{\left(a,b\right)}=0$,
which doesn't typically happen. So $\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,b\right)}*\left.\dfrac{\partial h}{\partial u}\right|_{\left(c,d\right)}=1$
is not expected.
This is not surprising in light of our previous discussion. Note that
$\left.\dfrac{\partial f}{\partial x}\right|_{\left(a,b\right)}=\partial_{1}f(a,b)=\left(\dfrac{\partial u}{\partial x}\right)_{y}$
holds $y$ constant, but $\left.\dfrac{\partial h}{\partial u}\right|_{\left(c,d\right)}=\partial_{1}h(c,d)=\left(\dfrac{\partial x}{\partial u}\right)_{v}$
holds $v$ constant, rather than $y$.
Multivariable Chain Rule
Throughout the calculations so far, I intentionally avoided the multivariable
chain rule
to simplify things, but it can shed light on both prior calculations
and analogous facts.
Suppose that not only do we have $x=h(u,v)$, but also $y=j(u,v)$.
Then the multivariable chain rule tells us that that linear approximation
to the $f,g$ transformation has to be the inverse of the linear approximation
to the $h,j$ transformation. In other words:
$$
\begin{bmatrix}\partial_{1}f(a,b) & \partial_{2}f(a,b)\\
\partial_{1}g(a,b) & \partial_{2}g(a,b)
\end{bmatrix}\begin{bmatrix}\partial_{1}h(c,d) & \partial_{2}h(c,d)\\
\partial_{1}j(c,d) & \partial_{2}j(c,d)
\end{bmatrix}=\begin{bmatrix}1 & 0\\
0 & 1
\end{bmatrix}
$$
In Leibniz notation, we might write this in the following way, although
the clarifying subscripts are often dropped:
$$
\begin{bmatrix}\left(\dfrac{\partial u}{\partial x}\right)_{y} & \left(\dfrac{\partial u}{\partial y}\right)_{x}\\
\left(\dfrac{\partial v}{\partial x}\right)_{y} & \left(\dfrac{\partial v}{\partial y}\right)_{x}
\end{bmatrix}\begin{bmatrix}\left(\dfrac{\partial x}{\partial u}\right)_{v} & \left(\dfrac{\partial x}{\partial v}\right)_{u}\\
\left(\dfrac{\partial y}{\partial u}\right)_{v} & \left(\dfrac{\partial y}{\partial v}\right)_{u}
\end{bmatrix}=\begin{bmatrix}1 & 0\\
0 & 1
\end{bmatrix}
$$
In particular, while we don't have $\left(\dfrac{\partial u}{\partial x}\right)_{y}\left(\dfrac{\partial x}{\partial u}\right)_{v}=1$,
we do have $\boxed{\left(\dfrac{\partial u}{\partial x}\right)_{y}\left(\dfrac{\partial x}{\partial u}\right)_{v}+\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial u}\right)_{v}=1}$.
Extra Identities
We can use the identity above to derive some more identities.
Since we know that $\left(\dfrac{\partial x}{\partial u}\right)_{v}=\left(\dfrac{\partial u}{\partial x}\right)_{v}^{-1}$,
the identity yields $\left(\dfrac{\partial u}{\partial x}\right)_{y}+\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial u}\right)_{v}\left(\dfrac{\partial u}{\partial x}\right)_{v}=\left(\dfrac{\partial u}{\partial x}\right)_{v}$.
But $\left(\dfrac{\partial y}{\partial u}\right)_{v}\left(\dfrac{\partial u}{\partial x}\right)_{v}=\left(\dfrac{\partial y}{\partial x}\right)_{v}$
by the single variable chain rule, so that $\left(\dfrac{\partial u}{\partial x}\right)_{y}+\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial x}\right)_{v}=\left(\dfrac{\partial u}{\partial x}\right)_{v}$
and $\boxed{\left(\dfrac{\partial u}{\partial x}\right)_{y}=\left(\dfrac{\partial u}{\partial x}\right)_{v}-\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial x}\right)_{v}}$.
We can substitute this into the Jacobian determinant $\dfrac{\partial\left(u,v\right)}{\partial\left(x,y\right)}=\det\begin{bmatrix}\left(\dfrac{\partial u}{\partial x}\right)_{y} & \left(\dfrac{\partial u}{\partial y}\right)_{x}\\
\left(\dfrac{\partial v}{\partial x}\right)_{y} & \left(\dfrac{\partial v}{\partial y}\right)_{x}
\end{bmatrix}=\left(\dfrac{\partial u}{\partial x}\right)_{y}\left(\dfrac{\partial v}{\partial y}\right)_{x}-\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial v}{\partial x}\right)_{y}$ to find that $\dfrac{\partial\left(u,v\right)}{\partial\left(x,y\right)}=\left(\left(\dfrac{\partial u}{\partial x}\right)_{v}-\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial x}\right)_{v}\right)\left(\dfrac{\partial v}{\partial y}\right)_{x}-\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\dfrac{\partial v}{\partial x}\right)_{y}$,
which equals $\left(\dfrac{\partial u}{\partial x}\right)_{v}\left(\dfrac{\partial v}{\partial y}\right)_{x}-\left(\dfrac{\partial u}{\partial y}\right)_{x}\left(\left(\dfrac{\partial y}{\partial x}\right)_{v}\left(\dfrac{\partial v}{\partial y}\right)_{x}+\left(\dfrac{\partial v}{\partial x}\right)_{y}\right)$.
A notable result (the "cyclic rule" in the OCW notes) is that
$\left(\dfrac{\partial v}{\partial y}\right)_{x}\left(\dfrac{\partial y}{\partial x}\right)_{v}\left(\dfrac{\partial x}{\partial v}\right)_{y}=-1$.
(This is also discussed in "What is meant by $\frac{\partial x}{\partial y}\frac{\partial y}{\partial z}\frac{\partial z}{\partial x}=-1$
? How to interpret it?".
) Combining this with $\left(\dfrac{\partial x}{\partial v}\right)_{y}=1/\left(\dfrac{\partial x}{\partial v}\right)_{y}$
gives $\left(\dfrac{\partial y}{\partial x}\right)_{v}\left(\dfrac{\partial v}{\partial y}\right)_{x}+\left(\dfrac{\partial v}{\partial x}\right)_{y}=0$.
As such, the expression for the Jacobian determinant above
simplifies down to $\boxed{\dfrac{\partial\left(u,v\right)}{\partial\left(x,y\right)}=\left(\dfrac{\partial u}{\partial x}\right)_{v}\left(\dfrac{\partial v}{\partial y}\right)_{x}}$.
This and the previous boxed equation are the "Jacobian rule" in
the OCW notes.
Note that swapping $u$ with $v$ and $x$ with $y$ swaps the rows
and columns of the Jacobian matrix, leaving the determinant the same,
so that we also have $\boxed{\dfrac{\partial\left(u,v\right)}{\partial\left(x,y\right)}=\left(\dfrac{\partial u}{\partial x}\right)_{y}\left(\dfrac{\partial v}{\partial y}\right)_{u}}$.