Confusion about chain rule with Leibniz's notation

Question

Suppose we have two functions $f,g:\Bbb R\rightarrow \Bbb R$. The chain rule states the following about the derivative of the composition of these functions, namely that $$ (f \circ g)'(x) = f′(g(x))\cdot g′(x). $$ However, the equivalent expression using Leibniz notation seems to be saying something different. I know that $f'(g(x))$ means the derivative of $f$ evaluated at $g(x)$, but when considering the Leibniz equivalent of the chain rule, it appears that it should really mean the derivative of $f$ with respect to $g(x)$. If we let $z=f(y)$ and y=$g(x)$, then $$ {\frac {dz}{dx}}={\frac {dz}{dy}}\cdot {\frac {dy}{dx}}. $$ Where here the $\frac{dz}{dy}$ corresponds to $f'(g(x))$. Since $y=g(x)$, I am tempted to believe that the expression $f'(u)$ means the derivative of $f$ with respect to $u$; it would make sense in this case as we are treating $g(x)$ as the independant variable. This leaves me with the question: does $f'(g(x))$ mean the derivative of $f$ evaluated at $g(x)$, $\frac{df}{dx} \Bigr\rvert_{x = g(x)}$, or the derivative of $f$ with respect to $g(x)$, $\frac{df}{dg(x)}?$

$ f'(X) $ is the derivative of $ f $ evaluated at $X$. – hamam_Abdallah May 04 '21 at 23:12 — hamam_Abdallah, May 04 '21 at 23:12

littleO · Accepted Answer · 2021-05-05T00:12:57.973

In my opinion the usual way of writing the chain rule in Leibniz notation is confusing and, I would say, bad. It's a frequent source of confusion on this website.

The function that is called $z$ on the left is not the same as the function that is called $z$ on the right. In other words, two different functions are being called by the same name. It would be better to give the function on the left its own name, such as $\hat z(x) = z(y(x))$. Then, using Leibniz notation, the chain rule could be written as $\frac{d\hat z}{dx} = \frac{dz}{dy} \frac{dy}{dx}$. This is still a little confusing: $\frac{dz}{dy}$ is to be interpreted as $z'(y(x))$.

In my opinion the notation $$\hat z'(x) = z'(y(x)) y'(x)$$ is far more clear.

To specifically address the final part of your question: $f'(g(x))$ is the derivative of $f$ evaluated at $g(x)$. I would not use the phrase "derivative of $f$ with respect to $g(x)$".

Edit: Here is the thought process behind the Leibniz notation, and an explanation for why it has become so popular despite the fact that I think it's confusing.

Think about the quantity $z(y(x))$, and imagine what happens if $x$ is perturbed by a small amount $\Delta x$. Then the output of $y$ is perturbed by a small amount $\Delta y$, and the output of $z$ is correspondingly perturbed by a small amount $\Delta z$. And we have $$ \frac{\Delta z}{\Delta x} = \frac{\Delta z}{\Delta y} \frac{\Delta y}{\Delta x} $$ The term on the left is approximately $\hat z'(x)$, but you can see the temptation to call it $\frac{dz}{dx}$. The term $\frac{\Delta z}{\Delta y}$ is approximately $z'(y(x))$, but you can see the temptation to call it $\frac{dz}{dy}$. And the term $\frac{\Delta y}{\Delta x}$ is approximately $y'(x)$, and of course you see the temptation to call it $\frac{dy}{dx}$.

@JohnHippisley That is just the definition of the notation $dz/dy$, and part of the reason I dislike Lebiniz's notation. I think you will agree with me that it is weird to use $dz/dy$ to mean $z'(y(x))$. — Jackozee Hakkiuz, May 04 '21 at 23:53
@JackozeeHakkiuz Yes, I will most certainly agree. However I managed to think of an explanation: if we define $z$ as being the value of $f(y)$, it follows that $\frac{dz}{dy}$ is the derivative of $f$. Now, because $y$ itself is a function of $x$, this derivative is evaluated at the value of $y$, namely $y(x)$. I think this is correct, but as you mentioned in your answer, a lot of interpretation is required. — John Hippisley, May 04 '21 at 23:59
Hi Daniel. Is it possible if you explain your objections to Leibnizian notation more? If $y=f(u)$, where $u=g(x)$, then the chain rule is $$\frac{dy}{dx}=\frac{dy}{du} \cdot \frac{du}{dx} , .$$Since the variables $y$ and $x$ are linked by the composite function $f \circ g$, $\frac{dy}{dx}$ denotes $(f\circ g)'(x)$. However, $y$ and $u$ are simply connected by the function $f$, and so $\frac{dy}{du}$ means $f'(u)=f'(g(x))$. Finally, since $u=g(x)$, $\frac{du}{dx}=g'(x)$. — Joe, May 31 '21 at 14:49
It seems to me that this formulation of the chain is pretty clear when done this way. Things only start to get muddled when you denote $f'(x)$ as $\frac{df}{dx}$, since then the numerator of the fraction should be a variable, not a function (at least, that's the way that Leibniz thought about it). — Joe, May 31 '21 at 14:49

score 2 · Answer 2 · answered May 04 '21 at 23:14

2

$f'(g(x))$ means the derivative of $f$ evaluated at $g(x)$. Really the ambiguous one is Leibniz notation, because it makes you think the function $f$ "cares" about what is the name of its argument. $f$ is a funcion of one variable, so it can only be differentiated with respect to one thing: its only entry.

answered May 04 '21 at 23:14

Jackozee Hakkiuz

5,583
1
14
35

So then why is the $\frac{dz}{dy}=f'(g(x))$? Also, is it correct to say that the former expression is the derivative of $f \circ g$ with respect to $g(x)$ (i.e. we now treat $g(x)$ as the independant variable)? – John Hippisley May 04 '21 at 23:23
1

Basically, yes. Since we're using $y = g(x)$ and differentiating with respect to that $y$ by ignoring its relationship to $x$. You could also write $\frac{df}{dg(x)}$ and it would be correct, if a bit weird and ugly. – ConMan May 04 '21 at 23:39
That's exactly why I say Leibniz notation is ambiguous (at least in my view). They use the notation $dz/dy$ to mean $f'(g(x))$, which I find weird.
Just out of curiosity, if $g(x)=x^5+1$ and $f(x)=\sin x$, how would you calculate the derivative of $f$ with respect to $g(x)$?
– Jackozee Hakkiuz May 04 '21 at 23:49
3

My point is that Leibniz's notation requires interpretation, whereas $(f\circ g)'(x)=f'(g(x))g'(x)$ means exactly what it is written. – Jackozee Hakkiuz May 04 '21 at 23:51
1

@JohnHippisley Perhaps you might find this or this answer (and the others linked in that post by Jackooze Hakkuiz) useful – peek-a-boo May 05 '21 at 00:06
@peek-a-boo Thank you, those answers were very helpful. – John Hippisley May 05 '21 at 01:51

score 2 · Answer 3 · answered May 05 '21 at 09:20

While the other answers deal with the modern definition of derivatives, it is not actually impossible to make the original Leibniz notation completely rigorous as I sketched here (see "Notes"). In fact, doing so yields a generalization of the usual notion of derivatives (at least for one parameter), as shown by the examples in the linked post.

Furthermore, we can completely explain the error in your reasoning in this framework. $ \def\lfrac#1#2{{\large\frac{#1}{#2}}} $

Take any variables $x,y,z$ varying with parameter $t$ (which may well be $x$ or may be something else we do not care about). Then whenever $\lfrac{dz}{dy},\lfrac{dy}{dx}$ are defined, we have $\lfrac{dz}{dx} = \lfrac{dz}{dy} · \lfrac{dy}{dx}$. If furthermore there are functions $f,g$ such that $z = f(y)$ and $y = g(x)$ everywhere (i.e. for every $t$), then by plain substitution $\lfrac{d(f(g(x)))}{dx} = \lfrac{d(f(y))}{dy} · \lfrac{d(g(x))}{dx}$, which is equivalent to $(f∘g)'(x) = f'(y) · g'(x)$. Since $f'(y) = f'(g(x))$ everywhere, there is nothing wrong here at all!

So what is the error? $f'(u)$ is not "the derivative of $f$ with respect to $u$". That phrase actually does not make sense, because $f$ is a function in the modern sense and does not have any 'independent variable'! Instead, $f'(u) = \lfrac{d(f(u))}{du}$ for every variable $u$ whose value is always in the domain of $f$.

So $f'(g(x))$ is the derivative of $f$ at $g(x)$ but is not what you thought. Your "$\lfrac{df}{dx}|_{x=g(x)}$" does not make sense for two reasons: (1) Leibniz notation cannot be (correctly) mixed with (modern) functions, so "$\lfrac{df}{dx}$" is incorrect; (2) "$x=g(x)$" is meaningless. Instead, $f'(g(x)) = \lfrac{d(f(g(x)))}{d(g(x))}$, exactly in line with the above explanation of the Leibniz chain rule.

By the way, the reason for having variables $x,y,z$ possibly different from the underlying parameter $t$ is that in many applications it is often the case that we are interested in variables that in reality vary with respect to time $t$, but have some relation that does not depend on time, such as here.

Ah, I think I see the fundamental idea behind Leibniz notation now. It requires interpretation-- one must know what we are calling the independent and dependent variables and where the derivative is being evaluated at. We can make the choice to name the argument of a function and its corresponding output, say $x$ and $y$, respectively. With this we can consider the ratio of the resulting change in $y$ due to an infinitesimal change in $x$ the "derivative" of the function. Leibniz notation is purely suggestive and in that regard, it can be useful. — John Hippisley, May 05 '21 at 14:25
@JohnHippisley: Yes that's right; the correct interpretation of Leibniz notation works, but not a sloppy one that conflates modern functions with (input/output) variables. It is very common even in other areas of mathematics to conflate "$f$" with "$f(x)$", but it never ends well. Anyway take a look at the linked post to see that Leibniz notation with the correct definitions can be much more powerful than the standard notion of derivative. I want to emphasize that there is no need to rely on 'infinitesimals' once we interpret "Δt → 0" as a sequence that is eventually not 0 but tending to 0. — user21820, May 05 '21 at 15:00
So although it is incorrect to say that the derivative is a quotient of “infinitesimal” differentials, it is correct so say that it is the limit of this quotient when the differentials approach zero? — John Hippisley, May 05 '21 at 16:19
@JohnHippisley: Yes! That's what drove Leibniz's original insight, and was later formalized in the modern definition $f'(x) = \lim_{h→0} \frac{f(x+h)-f(x)}{h}$. The disadvantage of the modern definition is that the original insight was obscured, and that is why I constructed the interpretation in the linked post to restore the insight while maintaining 100% mathematical rigour. — user21820, May 05 '21 at 16:55

Confusion about chain rule with Leibniz's notation

3 Answers3

Linked