2

I think that I didn't understant properly how to use Leibniz notation for derivatives and partial derivatives. I know that: $$\frac{df}{dx}=f'$$ And here I don't have any problem. But then if $g,f$ are two functions, how should I intend: $$\frac{df}{dg}$$ Is it $f'\circ g$?

Things get worse when we have to work in more variables.Let:

$\mathbf{f}:\mathbb{R}^n\to\mathbb{R}^m$

$\mathbf{g}:\mathbb{R}^m\to\mathbb{R}^p$

$\mathbf{\Phi}:=\mathbf{g} \circ \mathbf{f}$

$\mathbf{\Phi}(\mathbf{x})=(\Phi_1(\mathbf{x}),...,\Phi_p(\mathbf{x}))$

Here's how I would write the $j$-th partial derivative of the $i$-th component function of $\mathbf{\Phi}$: $$\frac{\partial \Phi_i}{\partial x_j}(\mathbf{x})=\sum_{k=1}^{m} \left[ \frac{\partial g_i}{\partial x_k}(\mathbf{f}(\mathbf{x}))\right ]\left[\frac{\partial f_k}{\partial x_i}(\mathbf{x})\right ] $$ Or if I want to omit the argument $$\frac{\partial \Phi_i}{\partial x_j}=\sum_{k=1}^{m} \left[ \frac{\partial g_i}{\partial x_k}\circ \mathbf{f}\right ]\left[\frac{\partial f_k}{\partial x_i}\right ] $$ But my book writes it in this way: $$\frac{\partial \Phi_i}{\partial x_j}=\sum_{k=1}^{m} \frac{\partial g_i}{\partial f_k} \frac{\partial f_k}{\partial x_i} $$

So am I supposed to understand by magic that: $$\frac{\partial g_i}{\partial f_k}:=\frac{\partial g_i}{\partial x_k}\circ \mathbf{f}$$ I know that the formula given by the book is more elegant and synthetic, but when I read it the first time I didn't understand anything. My question is: Is there a standard convention for this kind of notation? Because I'm seriously hating this notation, not only because of how unreadable is(to me), but also because all of the "differential cancellation" that we make in ODE(but this maybe will be part of a future question).

Thank you :)

Kandinskij
  • 3,709
  • Is it the partial derivatives that are bothering you, or just the notation? In single-variable calculus, if you have $f \colon \Bbb{R} \to \Bbb{R}$ and $g \colon \Bbb{R} \to \Bbb{R}$, then does the notation $\frac{df}{dg} \cdot \frac{dg}{dx}$ also bother you? If so, the question could be vastly simplified by not referring to multi-variable functions at all. – Nick Jan 08 '21 at 06:13
  • It's just the notation because, even in your notation I don't get what $\frac{df}{dg}$ should mean – Kandinskij Jan 08 '21 at 13:53
  • In short, "yes", in the context of the chain rule equation, if you compare the two ways of writing the chain rule: $(f \circ g)'(x) = f'(g(x)) \cdot g'(x)$ and $\frac{d}{dx}(f \circ g) = \frac{df}{dg} \frac{dg}{dx}$, we are clearly supposed to interpret $\frac{df}{dg}$ as meaning $f'(g(x))$. – Nick Jan 08 '21 at 17:04
  • And in the partial derivative case? – Kandinskij Jan 08 '21 at 17:22
  • Yes, the same thing of course. $\frac{\partial g_i}{\partial f_j}$ means take the derivative of $g_i$, and evaluate at $f_1,\dots,f_m$. In the context of the chain rule, at least. – Nick Jan 08 '21 at 17:23
  • So it's an ad-hoc notation defined for chain rule? And was I supposed to magically understand it withouth any further justification? Do you understand it because you've seen it before or there is a way to understand this kind of notation logically? – Kandinskij Jan 08 '21 at 17:30
  • @Eureka I dislike this Leibniz notation and find it to be unclear also. Suppose that $F(x) = f(g(x))$. The chain rule tells us that $F'(x) = f'(g(x)) g'(x)$, which is perfectly clear notation. In Leibniz notation, this is often written as $\frac{dF}{dx} = \frac{df}{dg} \frac{dg}{dx}$. On the right, the expression $\frac{df}{dg}$ in this context refers to the number $f'(g(x))$. I was confused by that at one point. – littleO Jan 08 '21 at 19:23
  • @littleO The chain rule in that form makes sense to me: in my mind, $df/dx$ is just a shorthand for $f'$. And if it is not made explicit what the argument of $f'$ is, then it is assumed that the argument is on the 'denominator'. So to me, $df/dg$ means $f'$, but since the argument is not specified, it is $f'(g(x))$. Still, I agree with you that a bit of fudging goes into this. – Joe Jan 08 '21 at 19:30
  • 1
    @Joe yes but what about: $\frac{d(f\circ g \circ h)}{dx}=\frac{df}{dg}\frac{dg}{dh}\frac{dh}{dx}$ (I've seen this notation a lot). By your logic it should be: $\frac{d(f\circ g \circ h)}{dx}=\frac{df}{d(g\circ h)}\frac{dg}{dh}\frac{dh}{dx}$ .I think that this notation is really(really) context dependant. – Kandinskij Jan 08 '21 at 19:34
  • @Eureka I think there is a way of making sense of this using Leibnizian notation, but it's too long for a comment. I'll write up an answer soon. I'll readily admit that the notation does involve some fudging, but in a way that can also be one of its strengths. – Joe Jan 08 '21 at 19:44
  • @Eureka At the bottom of my post, I outline how we can avoid this problem. – Joe Jan 08 '21 at 20:40
  • I would explain the intuition behind Leibniz notation like this. Suppose $F(x) = f(g(x))$. Imagine that we change the input to $F$ by a tiny amount $dx$. Let $dg$ be the corresponding change in the output of $g$ and let $df$ be the corresponding change in the output of $f$. The way I've phrased this, $dx, dg$, and $df$ are tiny real numbers. Now, doing usual arithmetic with real numbers, we have $df/dx = (df/dg) (dg/dx)$. (We're literally just dividing real numbers here.) But if you think about it, $df/dg$ is the same thing as (or at least nearly equal to) $f'(g(x))$. – littleO Jan 08 '21 at 20:50

2 Answers2

4

In the context of the chain rule, it is understood as you said (and as I said in the comments), in the way that you described as "magically".

Let me give an example to show a slightly different perspective, which hopefully will be the more "natural" interpretation you are looking for.

Consider a circle, with radius $r$. Its area is $A = \pi r^2$. Its circumference is $C = 2\pi r$. We can write $A$ in terms of $C$. While this might seem a little unnatural at first, just bear with me. Take the equation for $C$ and square both sides to get $C^2 = 4\pi^2 r^2$. Now just factor out $4\pi$ to get $C^2 = 4\pi \cdot \pi r^2 = 4\pi A$. Solving for $A$, you get $$ A = \frac{C^2}{4\pi} $$

Now, we have $A$ is a function of $C$, and $C$ is a function of $r$, so we are in the situation where we could use the chain rule, and it would tell us: $\frac{dA}{dr} = \frac{dA}{dC} \frac{dC}{dr}$. This of course gives $2\pi r$ either way you compute it, but that's not the point. My point is that because of the equation $A = \frac{C^2}{4\pi}$ above, we can make sense of the expression $\frac{dA}{dC}$ without even mentioning $r$! If you just "forget" for a moment that $C$ depends on $r$, and think of $C$ itself as being the independent variable (taking the equation $A = \frac{C^2}{4\pi}$ out of context), then $\frac{dA}{dC} = \frac{C}{2\pi}$ makes perfect sense, as just the ordinary derivative (no magical trick of notation).

My point here was that you can interpret $\frac{df}{dg}$ as just an ordinary derivative, thinking of $g$ itself as the variable (forgetting that $g$ is actually a function). The example above was supposed to illustrate that in many "real-life" situations where the functions have intuitive geometrical meanings, they can be either thought of as variables in their own right, or as functions of another variable. In my circle example, $C$ could either be interprepeted as a function of $r$, or we could just think of $C$ itself as a variable (and ignore the relation between $C$ and $r$).

Nick
  • 5,618
1

Here is how I would state the chain rule in Leibnizian notation:

Let $y=f(u)$, and $u=g(x)$. Then, if $f$ is differentiable at $u$, and $g$ is differentiable at $x$, then $f \circ g$ is differentiable at $x$, and $$ \frac{dy}{dx}=\frac{dy}{du} \cdot \frac{du}{dx} \, . $$

This can be easily translated back into Lagrange notation in the following way. The first term, $dy/dx$, is simply a shorthand for $$ \frac{df(g(x))}{dx}=(f \circ g)'(x) \, . $$ We know that $y=f(u)$, and so $$ \frac{dy}{du}=f'(u)=f'(g(x)) \, . $$ Finally, we know that $u=g(x)$, meaning that $$ \frac{du}{dx}=g'(x) \, . $$


The only perplexing feature of this notation is how $u$ is both treated as a variable (in $dy/du$), and as a function (in $du/dx$). However, while many people might argue against this notation for this very reason, it does have its advantages. Since $u=g(x)$, we may as well write $dy/du$ as $$ \frac{df(g(x))}{dg(x)} \, , $$ but since $g(x)$ is just a dummy variable that shows where we are evaluating the function at, this can be rewritten as $$ \frac{df(u)}{du} \Biggr|_{u=g(x)} \, . $$ I go into more detail about this here.


For $y=f(u)$, where $u=g(t)$, and $t=h(x)$, the chain rule becomes $$ \frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dt} \cdot \frac{dt}{dx} \, . $$ Again, since $dy/du = f'(u)$, this is equal to $f'(g(h(x))$, and the same translation can be made for the other terms.

Joe
  • 19,636