4

One problem I have found when learning calculus is that there are many different ways to denote the derivative. If $y=f(x)=x^2$, then we could write

\begin{align} f'(x)&=2x \\ y'&=2x \\ \frac{df}{dx}(x)&=2x \\ \frac{df(x)}{dx}&=2x \\ \frac{d}{dx}f(x)&=2x \\ \frac{dy}{dx}&=2x \end{align}

And this is just Lagrange and Leibniz's notations alone. What I find troubling is that they all seem to be suggesting subtly different things about what the derivative actually is. Is it a function, a limit of a quotient, or both? In the interests of keeping my post brief, I'll focus my attention on $f'(x)=2x$ and $\frac{dy}{dx}=2x$, as these seem to be the most common notations.

$$ f'(x)=2x $$

It does make sense to think of the derivative as the gradient function: $$ f'\colon x\mapsto\lim_{\Delta x \to 0}\frac{f(x+\Delta x)-f(x)}{\Delta x} $$ In this case the limit expression is equal to $2x$, and so we can write $$ f' \colon x \mapsto 2x $$ However, this notation seems a little counter-intuitive when we consider what it means to differentiate a function with respect to a variable other than $x$. If I ask what is the derivative of $f(x)$ with respect to $\frac{x}{2}$, does this question make sense? Is it simply $f'(\frac{x}{2})$? Or do we have to express $x^2$ in terms of $\frac{x}{2}$? And how can we can express this derivative using Lagrange's notation?

$$ \frac{dy}{dx}=2x $$

There are many things which are nice about Leibniz's notation, including the fact that it is explicit which variable you are differentiating with respect to. However, in this case, it is unclear whether we are talking about a function, or something else entirely. There are other issues. Some people say they dislike the Leibniz formulation of the chain rule $$ \frac{dy}{dx}=\frac{dy}{du}\frac{du}{dx} $$ saying that they find it to be inaccurate. I don't really understand why this is the case. Could someone please elaborate?

Sebastiano
  • 7,649
  • 3
    I addressed some of these questions in a previous answer of mine, I think you'll find it helpful (though part of it is about multivariable calculus). The derivative at a point is a number (and this number is calculated as the limit of the difference quotient). The derivative $f'$ is itself a function $f':\Bbb{R}\to \Bbb{R}$. I personally find the "derivative with respect to..." language confusing when first learning because it completely confused me on what is a function vs where the function is being evaluated etc. – peek-a-boo Aug 20 '20 at 15:52
  • @peek-a-boo Your answer is a poem :-) :-)...excellent your and the actual answer. – Sebastiano Aug 20 '20 at 15:53
  • @peek-a-boo I did find that helpful, thanks, although I am still a little unsure why it is inaccurate to write the chain rule as $dy/dx=dy/du \cdot du/dx$. Is it possible if you go into a bit more detail? I don't see much of a problem with $y$ depending on both $u$ and $x$, given that $u$ and $x$ are also related. – Joe Lamond Aug 20 '20 at 16:07

3 Answers3

7

Derivatives at a point are numbers (and these numbers are calculated as limits of a certain quotient), and if for each point you assign a number which is the derivative at that point, then you of course get a function $\Bbb{R}\to \Bbb{R}$. Leibniz's notation is confusing because it doesn't tell you where the derivatives are being evaluated, hence blurs the distinction between functions vs function values. (it may not seem like such a big deal especially when doing simple problems, but I guarantee that it will quickly get very confusing in multivariable calculus if all these basic concepts aren't kept straight).

Writing the chain rule as $\dfrac{dy}{dx} = \dfrac{dy}{du} \dfrac{du}{dx}$ is inaccurate for several reasons:

  1. It introduces completely irrelevant letters in the denominator (an unfixable flaw with Leibniz's notation)
  2. Doesn't tell you where the derivatives (which are functions as I explained in my previous paragraph) are being evaluated (you can try to make this more precise, but then you lose the "simplicity" of Leibniz's notation).
  3. The $y$ on the LHS has a completely different meaning from the $y$ on the RHS (this wouldn't be a huge deal if there was no chance of confusion... but unfortunately it causes a lot of confusion especially in several variables; see link below)

The third is I think the biggest problem, and I'll try to explain that now. In Lagrange's notation, the chain rule is expressed as $(y\circ u)'(x) = y'(u(x)) \cdot u'(x)$, or if you want to write a proper equality of functions, it is just $(y\circ u)' = (y'\circ u)\cdot u'$. So, there are actually three functions involved: there is $y$, there is $u$ and there is the composition $y\circ u$. The chain rule tells us how the derivatives of these three functions are related.

However, when you write $\dfrac{dy}{dx} = \dfrac{dy}{du}\cdot \dfrac{du}{dx}$, it gives the incorrect impression that there are only two functions, $y$ and $u$. Well, now you could argue that on the LHS we should "consider $y$ as a function of $x$" while on the RHS "$y$ is a function of $u$" so these are different things. This is of course right, the two things are very different, but this is all covered up in the notation. A perhaps slightly better way of writing it would be $\dfrac{d(y\circ u)}{dx} = \dfrac{dy}{du} \cdot \dfrac{du}{dx}$. But this is also not quite correct. Basically, any attempt to write the chain rule down formally is a huge nightmare. The best I can do is say that for every $x\in \text{domain}(u)$, \begin{align} \dfrac{d(y\circ u)}{dx}\bigg|_x &= \dfrac{dy}{du}\bigg|_{u(x)}\cdot \dfrac{du}{dx}\bigg|_x \end{align} This fixes issues $(2)$ and $(3)$ mentioned above to an extent, but $(1)$ still remains an issue.

You said in the comments that

I don't see much of a problem with $y$ depending on both $u$ and $x$, given that $u$ and $x$ are also related.

Well, if originally $y$ "depends on $u$", how can it all of a sudden "depend on $x$"? Of course, I know what you mean, but the proper way to indicate this dependence is not to say that "$y$ depends on $x$", but rather that the composite function $y\circ u$ depends on $x$. Here, you might think that this is just me being pedantic with language; and you're right. However, the reason I'm pedantic is because that poor language and notation leads to conceptual misconceptions; this has been both my experience when studying and also based on what I've observed from some questions on this site. For example, in this question, the OP finds that $\frac{\partial F}{\partial y} = 0$ and $\frac{\partial F}{\partial y} = -1$. The reason for this apparent contradiction is that the two $F$'s are actually completely different things (I also recall a question in the single variable context, but I can't seem to find it).


Regarding your other question

If I ask what is the derivative of $f(x)$ with respect to $\frac{x}{2}$, does this question make sense? Is it simply $f'(\frac{x}{2})$? Or do we have to express $x^2$ in terms of $\frac{x}{2}$? And how can we can express this derivative using Lagrange's notation?

The answers in succession are "one could make sense of this question", "no", and "yes". Let me elaborate. So, here, we're assuming that $f:\Bbb{R}\to \Bbb{R}$ is given as $f(x) = x^2$. To make precise the notion of "differentiating with respect to $\frac{x}{2}$", one has to introduce a new function, $\phi:\Bbb{R}\to \Bbb{R}$, $\phi(t) = 2t$. Then, what you're really asking is what is the derivative of $f\circ \phi$? To see why this is the proper way of formalizing your question, note that \begin{align} f(x) &= x^2 = \left(2 \cdot \dfrac{x}{2}\right)^2 = 4 \left(\frac{x}{2}\right)^2 \end{align} and that $(f\circ \phi)(t) = f(2t) = (2t)^2 = 4t^2$. So this is indeed what we want.

And in this case, \begin{align} (f\circ \phi)'(t) &= f'(\phi(t)) \cdot \phi'(t) \\ &= [2 \cdot \phi(t)] \cdot [2] \\ &= [2\cdot 2t] \cdot 2 \\ &= 8t \end{align}

Notice how this is completely different from $f'\left(\frac{x}{2}\right) = 2 \cdot \frac{x}{2} = x$.

In general, when you have "___ as a function of $\ddot{\smile}$ " and you instead want to "think of ___ as a function of @", what is going on is that you have to use an extra composition. So, you need to have three sets $X,Y,Z$, a given function $f:Y\to Z$ (i.e we think of elements $z\in Z$ as "functions of" $y\in Y$) and if you now want to think of "z as a function of $x$", then what it means is that you somehow need to get a mapping $X\to Z$ which involves $f$ somehow. In other words, we need a certain mapping $\phi:X \to Y$ and then consider the composition $f\circ \phi$ (see for example the remarks towards the end of this answer).

Things can be slightly confusing when all the sets are the same $X=Y=Z = \Bbb{R}$, but in this case you should think of the three $\Bbb{R}$'s as "different copies" of the real line, and that each function maps you from one copy of the real line to another copy of the real line.


Edit:

Here's a passage from Spivak's Calculus text (Chapter 10, Question 33), where I first learnt about the double usage of the same letter.

Spivak

peek-a-boo
  • 55,725
  • 2
  • 45
  • 89
  • 1
    Thank you very much, peek-a-boo. From the amount of times that this question, or questions almost exactly like this one, have been asked in this site over and over again, one could argue that Leibniz notation is not the way to go (at least this is what I've gathered from my experience). Throughout these last months I've been trying to convey this sentiment in my answers related to this topic, but I think this post of yours sums up best exactly what I wanted to say. In the future I'll link up to this if you don't mind. – Jackozee Hakkiuz Aug 21 '20 at 03:35
  • 1
    Here is a couple of questions that I think illustrate the amount of confusion this can cause. this, this and this. – Jackozee Hakkiuz Aug 21 '20 at 03:42
  • @JackozeeHakkiuz yes, Leibniz's notation always comes with the implicit warning: equalities are true if interpreted properly, and the situation is so much worse in several variables (and a complete nightmare in the beginning if one starts Lagrangian mechanics where things like $\frac{\partial L}{\partial \dot{q}^i}$ are written). Those are some nice answers (but your first and third link lead to the same question) – peek-a-boo Aug 21 '20 at 12:14
  • @peek-a-boo Thanks very much, this was a very insightful answer. Here is my attempt to summarise what you are saying (please verify that I understand correctly). $1$) When working with functions, you can only really differentiate with respect to one of the function's arguments. E.g. if $f(x)=x^2$, then the derivative $f'(x)=2x$. When we 'differentiate $f(x)$ with respect to $x/2$', what we are really doing is creating a composite function. $f(x)$ depends on $x$, which in turn depends on $x/2$. Therefore, $f(x)=(g(x/2))^2$, where $g(x)=2x$. – Joe Lamond Aug 28 '20 at 22:45
  • $2$) The Leibnizian version of the chain rule glosses over this fact (at least when it is presented in the usual way). Therefore, it is less accurate. – Joe Lamond Aug 28 '20 at 22:49
  • I have a question, though. In Spivak's version, he writes that the 'chain rule ought to read $\frac{df(g(x))}{dx}=\frac{df(y)}{dy}|_{y=g(x)}\cdot\frac{dg(x)}{dx}$. However, I don't see why this can't be expressed more simply as $\frac{df(g(x))}{dx}=\frac{df(g(x))}{dg(x)}\cdot\frac{dg(x)}{dx}$. Is this just to emphasise that $g(x)$ is being treated as a variable, rather than a function in its own right, when it appears on the denominator? – Joe Lamond Aug 28 '20 at 22:53
  • @JoeLamond the first two comments are right. For the third, what does $\frac{df(g(x))}{dg(x)}$ even mean? If you really want to write it that way, then go ahead (as long as you know what you're doing). – peek-a-boo Aug 28 '20 at 23:28
  • now, suppose I want to write down $(f\circ g)'(2) = f'(g(2)) \cdot g'(2)$ in Leibniz notation. How would you write it? $\frac{d f(g(x))}{dx}|{x=2} = \frac{d f(g(x))}{d g(x)}|{g(x)= g(2)} \cdot \frac{dg(x)}{dx}|_{x=2}$? Is it really worth it trying to come up with such barbaric notation all in the hopes of trying to make the chain rule "look like cancellation" of differentials? – peek-a-boo Aug 28 '20 at 23:38
  • @peek-a-boo Writing it that way makes me think of the intuitive way of thinking about the chain rule. Let's say we have $y=(2x+5)^2$. It is actually pretty obvious what $dy/d(2x+5)$ is because $2x+5$ is just another variable. To make things more familiar though, we let $u=2x+5$, and so $y=u^2$, and so $dy/d(2x+5)=dy/du=2u=2(2x+5)$. I understand that having an expression such as $2x+5$ on the denominator might seem strange or non-standard, but I was wondering if there was anything conceptually wrong with the way I conceive of the chain rule. – Joe Lamond Aug 29 '20 at 19:04
  • @JoeLamond Let's see. You can define $$\frac{df(x)}{dx} := f'(x)$$ but would you be willing to write $f'(2)$ as $\frac{df(2)}{d2}$? I mean it can be done, but that doesn't mean it should, and also I don't think it makes anything clearer. – Jackozee Hakkiuz Sep 07 '20 at 06:35
  • @peek-a-boo I realise that this is quite an old post, but I'm still not exactly sure why the chain rule, written in Leibnizian form, is considered an abuse of notation. If $y=f(g(x))$, then $\frac{dy}{dx}=\frac{df(g(x))}{dx}=(f \circ g)'(x)$. But, setting $u=g(x)$, $y=f(u)$, meaning that $\frac{dy}{du}=f'(u)$. Hence, I think $dy/du$ has an unambiguous meaning. – Joe Feb 18 '21 at 10:16
  • $y$ and $x$ are linked through the intermediate variable $u$, meaning that $\frac{dy}{dx}=(f \circ g)'(x)$. On the other hand, there is a direct link between $y$ and $u$, namely that $y=f(u)$. Hence, $\frac{dy}{du}=f'(g(x))$. I do consider it to be an abuse of notation to write $\frac{df}{dx}=\frac{df}{dg} \cdot \frac{dg}{dx}$, however, since on the LHS it is unclear that '$df$' refers to the composite function $f \circ g$. Are there any problems with my line of thinking? Does it not generalise to higher dimensions very easily, for instance? – Joe Feb 18 '21 at 10:19
  • @Joe I believe I've already expressed myself as clearly as I can. Please take a closer look at the links I've provided and some of my other answers on related issues. If my answers still aren't satisfactory, then I'm sorry but I guess I'm not the right person to be asking. Anyway the bottom line is this: use whatever notation works best for you, but just be aware of the pitfalls of Leibniz's notation (i.e what types of manipulations are valid) and which are not. If you know what you're doing then of course you can write things however you wish. – peek-a-boo Feb 18 '21 at 10:35
  • Multivariable calculus and Leibniz's notation there is another whole can of worms. If you ever take a course in Lagrangian mechanics and see $\frac{\partial L}{\partial \dot{q}^i}$ and wonder things like "how can I treat position and velocity as independent variables" then that's another whole beast which just shows the inherent conceptual issues students face when first learning all these things using Leibniz's notation. Of course once you know what you're doing you can write things however you want, but it is my belief that to gain that understanding, one must first write things precisely. – peek-a-boo Feb 18 '21 at 10:39
  • @peek-a-boo: I agree with your last point. I've been reading Michael Spivak's Calculus, and it has really given me a deeper understanding of the subject. It means I feel more confident turning an intuitive argument into a moderately rigorous proof. Anyway, thanks for the help peek-a-boo. One other thing about Leibnizian notation is that it is predicated on the idea of a 'variable'. While we informally work with variables all the time, from what I understand modern formalisations of analysis do away with variables and focus solely on functions, since functions are a more precise way of... – Joe Feb 18 '21 at 11:33
  • ...formulating things. So when we write $f'(x)=2x$ for all $x$, and say that $x$ is a 'variable', really, what we mean is that $x$ can be substituted for any real number, and it gives a valid result. If I understand correctly, the modern conception of a variable is a static placeholder, rather than literally a 'varying' quantity. – Joe Feb 18 '21 at 11:35
  • @peek-a-boo: If you could address my final point about variables, then that would be very helpful. But don't worry if you don't have the time. – Joe Feb 18 '21 at 12:23
  • Hello, @peek-a-boo. I seem to be unable to link this approach with what Deepak done here. If we proceed with $\phi(t)=f^{-1}(t)$ or with $h(x)=2x+2f^{-1}(x)+ln(x)$ ($h(f(x))=g(x)$), then $(g\circ \phi)'(x)=h'(x)=2+1/f'(x)+1/x$, which differs from what they got in their answer (they have $1/f(x)$ instead of $1/x$). Did they ever do it correctly, or am I missing some insight or motivation behind all this wicked process of "diff. $f$ wrt $g$"? Or is it the case where $\phi$ is not a simple inverse? – noballpointpen Feb 24 '24 at 13:02
  • @noballpointpen you’re mistaken. To calculate $\frac{dg(x)}{df(x)}$, you’re supposed to evaluate $(g\circ f^{-1})’$ at the point $f(x)$, not at the point $x$ (otherwise, the domains wouldn’t even match in general). Note that $(g\circ f^{-1})(y)=2y+f^{-1}(y)+\log y$. So, $(g\circ f^{-1})’(y)=2+\frac{1}{f’(f^{-1}(y))}+\frac{1}{y}$. So, evaluating at $f(x)$ gives $(g\circ f^{-1})’(f(x))=2+\frac{1}{f’(x)}+\frac{1}{f(x)}$, which is also consistent with the chain rule $(g\circ f^{-1})’(f(x))=g’[f^{-1}(f(x))]\cdot\frac{1}{f’[f^{-1}(f(x))]}=\frac{g’(x)}{f’(x)}$. – peek-a-boo Feb 24 '24 at 13:51
  • anyway, I should mention that this approach of introducing $f^{-1}$ is slightly less general (for this specific situation), because in order to make sense of the quotient $\frac{g’(x)}{f’(x)}$ one only needs differentiability of $f$ and $g$ at the single point $x$, and for $f’(x)\neq 0$; you don’t need to assume $f^{-1}$ exists. In one variable calculus, you can think of $\frac{dg(x)}{df(x)}$ as literally being short-hand notation for the number $\frac{g’(x)}{f’(x)}$ (see my other answer about ratios of exterior derivatives). In higher dimensions though, such a fraction wouldn’t make sense. – peek-a-boo Feb 24 '24 at 13:56
  • 1
    anyway, I think your big mistake was in thinking that $(f^{-1})’(x)=\frac{1}{f’(x)}$, when in reality, the correct rules are $(f^{-1})’(x)=\frac{1}{f’(f^{-1}(x))}$, or equivalently, that $(f^{-1})’(f(x))=\frac{1}{f’(x)}$. – peek-a-boo Feb 24 '24 at 14:00
  • Ahhh, the last one was a terrible mistake! – noballpointpen Feb 24 '24 at 14:08
  • @peek-a-boo Would it be acceptable and correct to explain all this muddle using precise notion of the chain rule statement for Leibniz notation, i.e.: Suppose $\phi$ is diff at $f(u)$ and $g=g(x)$ (using the same letter for clarity) is at $x=\phi(f(u))$, then the composite $g=g(\phi(f(u)))$ is diff. at $f(u)$, so that $\frac{dg}{df}=\frac{dg}{dx}\frac{dx}{df}$? As you extensively pointed out, all this is torturous... – noballpointpen Feb 24 '24 at 16:38
  • 1
    Yes the chain rule is always the explanation. – peek-a-boo Feb 24 '24 at 17:31
  • @peek-a-boo, I would like to discuss a remaining point. I thought about what $\phi$ would like in general (for reals). What was clear to me that even though $f'(x)\neq 0$, it remains not obvious why $dg/df$ would even be definable, if not for $f$ being locally invertible. Here a user claims that it is important for $f$ (in $dg/df$) to be locally 1-1. I see the following approaches: 1) require locally 1-1; 2) somehow ignore it and build $\phi$ around distinct values (it still would be diff. at $x$); 3) just ignore it and happily define $g'/f'$. – noballpointpen Mar 02 '24 at 15:06
  • 1
    For this situation my preference is to go with option 3. But since you asked: if there is an interval $I$ on which $f’$ does not vanish (this certainly happens if $f$ is $C^1$ and there is a point at which $f’(x_0)\neq 0$), then $f$ is strictly monotonous on that interval, so is invertible there, and the inverse is differentiable (I’m essentially repeating what the Inverse function theorem states). – peek-a-boo Mar 02 '24 at 16:12
  • Huh, my textbook didn't mention anything about $f$ being continuously differentiable for the hypothesis of this theorem (wikipedia proposes it), and I didn't bother to gather more information about the theorem. I see your point clearly, I was just thinking about extreme and ugly cases. Thank you for your time :) – noballpointpen Mar 02 '24 at 19:39
  • @noballpointpen I said $f$ being $C^1$ is *sufficient*. In one dimension, the IFT doesn’t need $C^1$. – peek-a-boo Mar 02 '24 at 19:57
0

Differentiation maps what I'll call "vanilla" functions (e.g. functions from reals to reals, but which functions we consider "vanilla" is context-dependent) to vanilla functions; differentiation at a point obtains a vanilla function by differentiation, then evaluates that function at said point. These two processes are related by currying/uncurrying. So $\frac{d}{dx}$ is a vanilla-function-to-vanilla-function function, a decidedly non-vanilla function you might here called a functional or operator in various contexts.

A further note on the not-a-functional functions I called "vanilla": such functions might map from one space of points to another, & differentiation can move from one space of such functions to another. For example, $\nabla$ sends $f(x,\,y)$, function from $\Bbb R^2$ to $\Bbb R$, to a function from $\Bbb R^2$ to $\Bbb R^2$.

As for the chain rule, it's short for$$\lim_{h\to0}\frac{y(x+h)-y(x)}{h}=\lim_{k\to0}\frac{y(u(x)+k)-y(u(x))}{k}\lim_{H\to0}\frac{u(x+H)-u(x)}{H}.$$The Leibniz formulation glosses over the distinction between $u$ being the independent variable in $\frac{dy}{du}$ & its being the dependent variable in $\frac{du}{dx}$. All the same, we can make sense of differentiating $y=x^2$ with respect to $u=\frac{x}{2}$ this way. Either you can say$$y=4u^2\implies\frac{dy}{du}=8u,$$or you can get the same result from$$\frac{dy}{du}=\frac{\frac{dy}{dx}}{\frac{du}{dx}}=\frac{2x}{\frac12}=4x=8u.$$

J.G.
  • 115,835
0

I would also like to point out to one more interpretation, which is more intuitive and which was quite helpful for me to fix the meaning behind everything said about the problem. You can think about \begin{align}\frac{df}{dg}\end{align} as a slope of a tangent line to the parametric curve defined by points $(g(t),f(t))$ at a given point.

If $f$ and $g$ are differentiable functions on $(a,b)$, and if the parametric equations $y=f(t)$, $x=g(t)$ determine $y$ as a differentiable function $h$ of $x$ on an arc of the curve defined by points $(g(t),f(t))$ ($t\in(a,b)$) that extends beyond a point $P=(g(t_0),f(t_0))$, and if $g'(t_0)\neq0$, then \begin{align}\frac{dy}{dx}\biggr|_{P}=\frac{f'(t_0)}{g'(t_0)}\end{align}

And the expression $dy/dx$ here can be thought of as $df/dg$. Now, I think, the meaning behind the ridiculous phrase "differentiate $x^2$ with respect to $x/2$" receives much more insight.