3

I am asking this question simply out of curiosity.

Why the double differentiation is denoted by $\frac{d^{2}f(x)}{dx^2}$ and not by $\frac{d^{2}f(x)}{d^2x}$ or $\frac{d^{2}f(x)}{(dx)^2}$. Is there any mathematical reasoning behind it?

It seems like a similar question has been asked before here. But then the same discussion happened there, i.e., why $\frac{d^{2}f(x)}{(dx)^2}$ is not the correct notation :/

IY3
  • 92
  • I believe that it relies on the fact that you can also take a double derivative on two variables, where it looks like ${d^2f(x,y)\over dxdy}$. I believe the second example expression is actually equivalent to the conventional notation. – opfromthestart Jun 17 '21 at 14:53
  • 1
    It's shorthand for $\frac d{dx},\frac{df}{dx}$ with the "like terms" "combined" when "multiplied". Your second suggestion would be the "correct" way to multiply the "denominators" together, but the founders of the notation evidently liked simplicity more than correctness. –  Jun 17 '21 at 14:54
  • @MatthewDaly Ahh! Mathematics should believe more in the correctness than simplicity – IY3 Jun 17 '21 at 14:56
  • @HansLundmark Thanks!. But it did not answer the question. I am wondering why $\frac{d^{2} f(x)}{(dx)^2}$ is not the correct usage? The same discussion took place there also. – IY3 Jun 17 '21 at 14:59
  • 3
    I believe it also relies on the fact that $dx$ is considered independent of $x$. In original notation, it actually referred to an infinitesimal, so $dx^2=(dx)^2$ irrespective of x. – opfromthestart Jun 17 '21 at 14:59
  • 1
    @InuyashaYagami I would say that $(dx)^2$ in the denominator is not technically incorrect but so overwhelmingly uncommon that it is wrong “by convention” because using it makes people wonder if you mean something different. – Eike Schulte Jun 17 '21 at 17:28

1 Answers1

2

There is a reason for the notation, but the present notation has its own problems.

The reason for the notation is that the derivative is a differentiation followed by a division by $dx$. Historically, we didn't always deal in derivatives but often used differentials (Newton was primarily derivatives, Leibniz was primarily differentials). So, the differential of, say, $x^2$ is $2x\,dx$.

You get a derivative by taking the differential of a function and dividing by $dx$. You get the second derivative by doing it again. So, you'll be dividing by $dx$ twice, so $dx^2$ is on the denominator.

However, this doesn't quite match with what we have in the present notation. A fuller notation can be found by actually taking the derivative of the derivative - i.e., treating the derivative $\frac{dy}{dx}$ as a fraction, differentiating (using the quotient rule since it is a quotient), and then dividing by $dx$.

This yields:

$$y'' = \frac{d^2y}{dx^2} - \frac{dy}{dx}\frac{d^2x}{dx^2}$$

Where $du^2$ is a shorthand for $(d(u))^2$ and $d^2u$ is a shorthand for $d(d(u))$. Using the form above, you can actually treat the second derivative in an algebraic manner (canceling denominators, multiplying "both sides" by something, etc.). With the traditional notation, you can only really manipulate it using the "chain rule for the second derivative."

Here's a more full derivation. We will use the full notation $d(u)$ so that everything is more clear:

$$ y'' = \frac{d\left(\frac{d(y)}{d(x)}\right)}{d(x)} \\ = \frac{\frac{d(x)\, d(d(y)) - d(y)\, d(d(x))}{(d(x))^2}}{d(x)} \text{ (quotient rule on top)} \\ = \frac{d(x)\, d(d(y)) - d(y)\, d(d(x))}{(d(x))^3} \text{ (simplifying denominators)} \\ = \frac{d(x)\,d(d(y))}{(d(x))^3} - \frac{d(y)\,d(d(x))}{(d(x))^3} \text{ (splitting the fraction)} \\ = \frac{d(d(y))}{(d(x))^2} - \frac{d(y)}{d(x)}\,\frac{d(d(x))}{(d(x))^2} \text{ (simplifying)} $$

Note 100% sure, but I think the "normal" notation came about because, if you imagine the independent variable moving at a constant rate (so $d(x)$ is constant), then, by definition, the differential of a constant is zero, which wipes out the second term. However, this fails if $x$ is a function of some other variable, and therefore the notation typically used doesn't generalize. I think it gained popularity in physics because they generally used time as the independent variable, and, in the time before we knew about relativity, the flow of time was considered constant. If you were working with something whose independent variable wasn't so independent, you can still manipulate it using the chain rule for the second derivative. In the expanded notation here such a rule isn't strictly necessary, as you can simply do algebraic manipulations to do the same thing (but the chain rule can help you remember what manipulations to do!).

For more information, see my paper "Extending the Algebraic Manipulability of Differentials". It's available on arXiv.

johnnyb
  • 3,509