10

In my physics class my professor was abusing the derivative, as per so many physics classes I've been in. This time, he took the quantity $(x+dx)(y+dy)$ and argued that the $dxdy$ term should disappear, because it's so much smaller than the rest, (despite $dx, dy$ both being infinitesimal...). In any case, I know this is related to non-standard analysis, or something of the sort, and I was wondering if someone could explain in whatever light is proper, why the product of two infinitesimals can be said to be zero. With whatever wonderfully terrible mathematical rigor that is required.

Mikhail Katz
  • 42,112
  • 3
  • 66
  • 131
  • 9
    It's not closely related to non-standard analysis; it's related to the way physicists do mathematics, which is not exactly the same as the way mathematicians do it. – MJD Sep 08 '14 at 21:43
  • 2
    Physics types often use $\delta x, \delta y$ where maths types might use $\delta,\epsilon$. Then they just write $dx$ instead of $\delta x$. Are you sure it was not a legitimate $\epsilon-\delta$ argument? – almagest Sep 08 '14 at 21:44
  • 2
    If you think of $dx$ as a physicist's way of thinking of an actual finite approximation this makes more sense. (said the one-time physicist) – amcalde Sep 08 '14 at 21:46
  • 3
    For a number system in which this sort of manipulation is completely justified, see the dual numbers – Ben Grossmann Sep 08 '14 at 21:52
  • 5
    Interestingly enough, this was how Leibniz "proved" the product rule (or at least, most people believe it was Leibniz). He wanted to find $d(uv)$ where $u(x)$ and $v(x)$ were differentiable functions. He got $d(uv) = (u + du)(v + dv) - uv = u(dv) + v(du) + (du)(dv)$. He then argued that the $(du)(dv)$ was much smaller than a "normal" differential, and could thus be disregarded (yielding the correct formula for the product rule, $\frac{d}{dx}(uv) = \frac{du}{dx} v + \frac{dv}{dx} u$). – user28375028 Sep 08 '14 at 22:01
  • 1
    If you could give more context, you might get a more rigorous answer. In many cases these computations reduce to a differential equation that must be solved to compute the answer. – copper.hat Sep 08 '14 at 22:06
  • What I was wondering was something like what @AlexMiller said. Disregarding the cross term and arriving a correct (and exact!) answer. I'm not talking about an approximation. –  Sep 09 '14 at 01:17
  • 1
    LOL "abusing the derivative." – user4894 Sep 23 '16 at 01:17

7 Answers7

9

There are several perfectly rigorous ways to formalize this kind of reasoning, none of which require any nonstandard analysis (which you should be quite suspicious of as it relies on a weak choice principle to even get off the ground).

One of them is, as Robert Israel says, interpreting statements about infinitesimals as statements about limiting behavior as some parameter tends to zero. For example, you can define what it means for a function $f(x)$ to be differentiable at a point: it means there is some real number $f'(x)$ such that (in little-o notation)

$$f(x + \epsilon) = f(x) + f'(x) \epsilon + o(|\epsilon|)$$

as $\epsilon \to 0$. After you prove some basic lemmas about how little-o notation works, you get some very clean and intuitive proofs of basic facts in calculus this way. For example, here's the product rule:

$$\begin{eqnarray*} f(x + \epsilon) g(x + \epsilon) &=& \left( f(x) + f'(x) \epsilon + o(|\epsilon|) \right) \left( g(x) + g'(x) \epsilon + o(|\epsilon|) \right) \\ &=& f(x) g(x) + (f'(x) g(x) + f(x) g'(x)) \epsilon + o(|\epsilon|). \end{eqnarray*}$$

After writing down a bunch of arguments like this, if you're familiar with elementary ring theory it becomes very tempting to think of expressions that are $o(|\epsilon|)$ (meaning they grow more slowly than $|\epsilon|$ as $\epsilon \to 0$) as an ideal that you can quotient out by, and this intuition can also be formalized.

More precisely, in the ring $R = C^{\infty}(\mathbb{R})$ of smooth functions on $\mathbb{R}$, for any $r \in \mathbb{R}$ there's an ideal $(x - r)$ generated by the function $x$, consisting of all functions vanishing at $r$. Working in the quotient ring $R/(x - r)$ amounts to only working with the value at $r$ of a function. Working in the quotient ring $R/(x - r)^2$, though, amounts to working with both the value at $r$ and the first derivative at $r$, with multiplication given by the product rule. Similarly, working in $R/(x - r)^{n+1}$ amounts to working with the value at $r$ and the first $n$ derivatives at $r$.

Taking ideas like this seriously leads to things like formal power series, germs of functions, stalks of sheaves, jet bundles, etc. etc. It is all perfectly rigorous mathematics, and nonstandard analysis is a huge distraction from the real issues.

Qiaochu Yuan
  • 419,620
  • Your first equation is wrong - the correct equation is $f(x + \varepsilon) = f(x) + \varepsilon f'(x)$. –  Dec 16 '15 at 20:35
  • @mistermarko: I'm not treating $\epsilon$ as an infinitesimal here, just as a parameter that tends to zero. – Qiaochu Yuan Dec 16 '15 at 20:42
  • The OP probably won't clarify his comment about the normal solution so I can't say any more. –  Dec 16 '15 at 20:46
  • I'd advise the OP not to take too seriously the comment on choice. Though it's true that choice plays a central role in constructing the hyperreal numbers, I don't think it should be cause for concern. – AJY Feb 12 '17 at 03:02
4

In nonstandard analysis one can define derivatives without using limits: if $dx$ is an infinitesimal, that is, a number greater than zero but less than every positive real number, then $f'(x)$ can almost be computed as $[f(x+dx)-f(x)]/dx$. To get the same result as in standard analysis, one then takes the "standard part" of this, the closest real number, which amounts to the throwing away of higher-order infinitesimals that your physics professor did.

Here are two explicit examples. Let's compute the derivative of $f(x)=x^2$. Let $dx$ be infinitesimal. Then $f(x+dx)-f(x)=x^2+2xdx+(dx)^2-x^2=2xdx+(dx)^2$. Dividing by $dx$ we get $2x+dx$. For $x$ a real number it's hopefully intuitive that the standard part of $2x+dx$ is $2x$, and so we get our familiar identity $f'(x)=2x$.

Now let's look at the product rule, which is the sort of situation in which your professor's argument might come up. We have $$(fg)'(x)dx\approx fg(x+dx)-fg(x)=$$$$[f(x)+f'(x)dx+c_1(dx)^2][g(x)+g'(x)dx+c_2(dx)^2]-fg(x)=(f'g+g'f)dx+c_3(dx)^2$$ Here we're using Taylor's theorem to expand $f$ and $g$-in the familiar context we say the $c_i$ don't go to infinity as $dx\to 0$, which in the nonstandard context is just to say the $c_i$ are not infinite for infinitesimal $dx$.

So here the $(dx)^2$ term will disappear, as your professor suggested, when we take the standard part of the derivative. But this only makes sense after we've subtracted $fg(x)$! Then we're justified in cutting off at the standard, or real, part of our expression-saying $(x+dx)(y+dy)=xy+ydx+xdx$ is rather arbitrary, in comparison.

Anyway, this discussion requires justifying the existence of infinitesimals, and our ability to compute with them as we do with reals, even applying Taylor's theorem to them. The full justification of this theory involves understanding a couple of logical topics: first-order predicate logic and ultraproducts. These aren't overwhelmingly technical, but have little to do with how the theory is used. For that, it's enough to know the

Transfer Principle All the same things are true of the extended reals with infinitesimals as of the standard reals that can be stated without saying "For every subset of $\mathbb{R}$..." or something equivalent.

(With apologies for the lack of precision in this statement-I hope it gets the point across.) Being careful with the transfer principle is probably where nonstandard analysis wins out over informal physical reasoning, that is, it lets us decide exactly when this sort of argument is reasonable. Specific examples are that the nonstandard reals and differentiable functions on them do satisfy the intermediate value theorem and Taylor's theorem but do not satisfy the least upper bound property.

Kevin Carlson
  • 52,457
  • 4
  • 59
  • 113
  • This was nice to read, I'm still a little confused though. I can see how in some way we can say the higher orders of the differentials go to zero, but as Columbus mentioned in his answer, what if we were to divide by something like $(dx)^2$. Would those higher order terms then be relevant? (I don't think this would be a second derivative, though.) And then the standard part would blow up, as it would have $dx$ in the denominator... –  Sep 09 '14 at 01:33
  • I can also see a reason why you would call what you called the standard part the standard part, but this seems so much like an approximation, I'm still confused why it isn't. Saying that the "higher order terms in $dx$" go to zero seems to imply something is "left over" that we're ignoring... I don't know. These questions might be ill phrased. –  Sep 09 '14 at 01:35
  • 1
    You can certainly divide by $(dx)^2$, but you only get a finite answer if there are no terms of order $1$ or $dx$ in your expression. There is indeed something left over, but it's actually infinitely small, as distinct from the usual case when it's arbitrarily small but still bigger than some real number. One justification for taking the standard part is that infinitely small quantities can't be observed. There are models in which $dxdy$ would be literally zero, but then there would be no way to keep track of such second-order values, which you might want for, say, second derivatives. – Kevin Carlson Sep 09 '14 at 03:15
4

One way of thinking about this is using a parameter $\epsilon$ as $\epsilon \to 0$. If $dx = O(\epsilon)$ and $dy = O(\epsilon)$ while $x$ and $y$ do not depend on $\epsilon$, then $dx\; dy = O(\epsilon^2)$, so it's correct to say $$ (x + dx)(y + dy) = xy + x\; dy + y\; dx + O(\epsilon^2)$$
And this can be manipulated further, perfectly rigourously, using the standard rules of Big O notation

Robert Israel
  • 448,999
  • I fear I was misleading with my question, this wasn't an approximation! The problem is too long to type out, but the answer we got by ignoring this term was the answer you get handling it the proper way with derivatives! –  Sep 09 '14 at 01:21
  • 2
    Yes, but derivatives are limits of difference quotients, and the statements involving $dx$ and $dy$ correspond to statements about those difference quotients in the limit as $dx$ and $dy$ go to $0$. – Robert Israel Sep 09 '14 at 03:38
  • @user82004 You should show us how you got the answer with derivatives then. –  Dec 16 '15 at 20:09
3

$$ \frac{\Big(x + x\,\Delta y + y\,\Delta x + \Delta y\,\Delta x\Big) - x }{\Delta t} = \underbrace{x\frac{\Delta y}{\Delta t} + y \frac{\Delta x}{\Delta t}}_A + \underbrace{\frac{\Delta y\,\Delta x}{\Delta t}}_B $$ $$ \overbrace{\frac{\Delta y\,\Delta x}{\Delta t} = \frac{\Delta y}{\Delta t}\Delta x = \frac{\Delta x}{\Delta t}\Delta y}^B $$ The expression labeled $B$ approaches $0$ since $\dfrac{\Delta y}{\Delta t}$ approaches a finite number and it is then multiplied by $\Delta x$, which approaches $0$. And similarly for the last term above.

1

The best explanation of the step converting $xdy+ydx+dxdx$ to the expression $xdy+ydx$ is still Leibniz's in terms of a generalized relation of equality he sometimes denoted by a symbol similar to "$\,{}_{\ulcorner\!\urcorner}\,$". Here $a\,{}_{\ulcorner\!\urcorner}\,b$ means that $\frac{a}{b}$ is infinitely close to $1$. Thus we have simply $$xdy+ydx+dxdx\;\;{}_{\ulcorner\!\urcorner}\;\;xdy+ydx$$ A similar formula holds when calculating the derivative of $y=x^2$, obtaining $\frac{dy}{dx}\;{}_{\ulcorner\!\urcorner}\;2x$.

No need for either little-o or big-o. Leibniz was explicit in describing the relation he was using as more general than equality, as a relation "up to" a negligible term. He wrote this, in particular, in a published article in 1695.

Here the point is to choose what Leibniz called an "assignable" value as the value for the derivative. Here $2x$ is assignable but $dx$ (as well as $2x+dx$) is what Leibniz refers to as inassignable. Leibniz explicitly described his infinitesimals as "inassignable".

Mikhail Katz
  • 42,112
  • 3
  • 66
  • 131
  • This is an interesting alternative to troubling abuses of what "=" means. "a = b" means that a can be written as b, and b can be rewritten as a.

    $$ \begin{align} (x+dx)^2-x^2 &= 2\ x\ dx + dx\ dx \ &= 2\ x\ dx
    \end{align} $$ would mean that $dx\ dx = 0$, and that you can just add a $dx$ as a square root of zero wherever you like.

    – Rob Nov 06 '23 at 01:50
  • @Rob, I am not sure what you mean. A modern formalisation of Leibniz's relation of "generalized equality" is in terms of the standard part. – Mikhail Katz Nov 06 '23 at 05:27
  • It's this that bothers me. Equals generally means "can be susbtituted for in both directions". So:

    \begin{align} 2\ x\ dx + 0 \ &= 2\ x\ dx + dx\ dx + dx\ dx + \cdots \end{align}

    says that $dx\ dx = 0$ (substitute in both directions). then your computer needs to handle this:

    \begin{align} \frac{d^2y}{dx^2} \end{align}

    Some of these things things come out really strangely when you try to implement them on a computer; where the rules are very literal, and there is nobody standing over the computer's shoulder to object to something strange being done.

    – Rob Nov 07 '23 at 01:38
  • \begin{align} f &= x^2 + y^2 \ d[ f &= 2x\ dx + 2y\ dy ] \ df &= 2x\ dx + 2y\ dy \ \frac{df}{dx} &= 2x + 2y\ \frac{dy}{dx} && \text{it isn't clear what is infinitesimal here} \end{align}

    It seems that you don't want to take the $st[]$ for implicit differentiation, but perhaps when you differentiate with respect to a variable you might. Automatic differentiation in computer libraries set one $d$ to infinitesimal, and others to $0$. This instead of $dx^2=0$.

    – Rob Nov 07 '23 at 03:13
  • @Rob, here you attempted to deal with a function of two variables, in which case the notation is not sufficiently explicit whether or not you use infinitesimals. Even for a single variable, Leibniz did not have a complete answer; see this post. My point was Leibniz was not so naive as to think that one has a literal equality $2x+dx=2x$. Note that even l'Hopital's book does not contain such an equality, contrary to claims in many books. – Mikhail Katz Nov 07 '23 at 12:02
  • The way that machine learning libraries work, is that they need to differentiate with respect to a large number of variables; into the billions. So they effectively just implicitly differentiate the result once. Then for da,db,dc,... you pick which direction you want for the gradient, like dx; and divide by it, and set all other da,db,dc, ... to zero. Something is defined as a constant by asserting that d[c]=0. It's infinitesimal if d[x]*dx[]=0 and d[x]>0. It's set to just zero to hold it constant. d[y]=0. – Rob Dec 19 '23 at 02:37
  • @Rob, this is very interesting because this is how nilsquare infinitesimals behave in Synthetic Differential Geometry; see [tag:Synthetic-Differential-Geometry]. It would appear that SDG may be a preferable formalisation of what actually goes on with machine learning libraries, where as you describe $d[x]*d[x]=0$. – Mikhail Katz Dec 19 '23 at 10:49
0

Answer left here as an example to stop others from using the same incorrect reasoning. Incorrect reasoning is as follows:

I think he is arguing something along these lines:

If $dx$ and $dy$ are infinitesimal, then $(dx)^2$, $(dy)^2$ and $dxdy$ are an order of magnitude even smaller and are hence negligible compared to $dx$ and $dy$.

e.g. $0.001$ and $0.002$ are very small numbers but their product ($0.001*0.002=0.000002$) is negligible compared to either of them.

Problem with reasoning: If dydx is negligible compared with dx, and can be disregarded, then dy is negligible compared with y, and can also be disregarded. I suspect the idea is that although dy isn't negligible when compared with x or y, dxdy is negligible compared with x and y, and can be disregarded in computations involving those.

DKNguyen
  • 225
Mufasa
  • 5,434
  • 1
    If that's the argument, it's a failure. If $dy,dx$ is negligible compared with $dx$, and can be disregarded, then $dy$ is negligible compared with $y$, and can also be disregarded. I suspect the idea is that although $dy$ isn't negligible when compared with $x$ or $y$, $dx,dy$ is negligible compared with $x$ and $y$, and can be disregarded in computations involving those. – MJD Sep 08 '14 at 21:53
  • @MJD - Good point. I completed missed that when thinking about this. Thanks for the clarification. – Mufasa Sep 08 '14 at 21:56
  • What is the normal etiquet for cases like this - should I delete my answer or leave it there to avoid others making the same mistake? – Mufasa Sep 08 '14 at 21:59
  • @Mufasa deleting it is fine. If you want to leave it as an example, change it to community wiki (by clicking the appropriate tick-box when you hit edit) and indicate clearly at the top that you've left this question as something to avoid. – Ben Grossmann Sep 08 '14 at 22:02
  • @Omnomnomnom - thx for the tip. What exactly does ticking the community wiki checkbox do? – Mufasa Sep 08 '14 at 22:05
  • @Mufasa Basically, it makes it so that your post no longer belongs to you, but the community. In particular, the downvotes will no longer be subtracted from your rep. – Ben Grossmann Sep 08 '14 at 22:08
  • 4
    I'm not sure I'd call this answer "incorrect reasoning". I think this is standard intuition when physicists manipulate $dx$ and $dy$. It might not be rigorous but physicists derive things like this all the time. – littleO Sep 08 '14 at 22:47
  • @littleO The problem with this answer is that, as written, says that $dx,dy$ is negligible compared with dx and dy. This is flat-out wrong. If any argument like that is going to be made, it should be that $dx;dy$ is negligible compared to x and y. You managed to miss this error once in the original post and then once when I pointed it out in my comment. – MJD Sep 09 '14 at 00:36
0

Let me give, as an example, the derivation of the product rule.

Informal version:

\begin{align}d(uv)&=\left((u+du)(v+dv)\right)-\left(uv\right)\\ d(uv)&=u\,dv+v\,du+du\,dv\end{align} We let the $du\,dv$ term disappear: \begin{align}d(uv)&=u\,dv+v\,du\\ \frac{d(uv)}{dx}&=u\frac{dv}{dx}+v\frac{du}{dx} \end{align}

Formal version:

For functions, $\Delta f$ means $f(x+\Delta x)-f(x)$ where $\Delta x$ is a small, but not infinitesimal, quantity. Thus, $\lim_{\Delta x\to0}\frac{\Delta f}{\Delta x}=\frac{df}{dx}$.

\begin{align}\Delta(uv)&=\left((u+\Delta u)(v+\Delta v)\right)-\left(uv\right)\\ \Delta(uv)&=u\Delta v+v\Delta u+\Delta u\Delta v\\ \frac{\Delta(uv)}{\Delta x}&=u\frac{\Delta v}{\Delta x}+v\frac{\Delta u}{\Delta x}+\frac{\Delta u}{\Delta x}\Delta v\end{align} Take the limit as $\Delta x$ goes to zero. Notice how the term on the right vanishes.
\begin{align}\frac{d(uv)}{dx}&=u\frac{dv}{dx}+v\frac{du}{dx}\end{align}

$\ $

So, the reason we could have treated the $du\,dv$ as zero, is because later, we only divided by $dx$ (so it became zero when we took the limit). If we were to later divide by $dx^2$, it would not become zero - however, we usually only divide by $dx$ once.

  • It seems like it doesn't generalize to arbitrary derivatives, though. For instance, if we take the example from Kevin's answer, we have $f(x + dx) - f(x) = 2xdx +(dx)^2$ and dividing by $((dx)^2)$ we would get something strange. Now that doesn't seem to be the right description of a second derivative in any case, but if we find this first derivative, and then throw away the $(dx)$'s, and then later take another derivative I feel as if we would get a different answer than if we kept them. It just feels very arbitrary! And indeed I know it isn't formal, but there's some related formality... –  Sep 09 '14 at 01:28