The formalism behind integration by substitution

Question

When you are doing an integration by substitution you do the following working. $$\begin{align*} u&=f(x)\\ \Rightarrow\frac{du}{dx}&=f^{\prime}(x)\\ \Rightarrow du&=f^{\prime}(x)dx&(1)\\ \Rightarrow dx&=\frac{du}{f^{\prime}(x)}\\ \end{align*}$$

My question is: what on earth is going on at line $(1)$?!?

This has been bugging me for, like, forever! You see, when I was taught this in my undergrad I was told something along the lines of the following:

You just treat $\frac{du}{dx}$ like a fraction. Similarly, when you are doing the chain rule $\frac{dy}{dx}=\frac{dy}{dv}\times\frac{dv}{dx}$ you "cancel" the $dv$ terms. They are just like fractions. However, never, ever say this to a pure mathematician.

Now, I am a pure mathematician. And quite frankly I don't care if people think of these as fractions or not. I know that they are not fractions (but rather is the limit of the difference fractions as the difference tends to zero). But I figure I should start caring now...So, more precisely,

$\frac{du}{dx}$ has a meaning, but so far as I know $du$ and $dx$ do not have a meaning. Therefore, why can we treat $\frac{du}{dx}$ as a fraction when we are doing integration by substitution? What is actually going on at line $(1)$?

This should be valid in basic calculus, probably not in the theory of differential forms. I'm not very sure. — , Feb 06 '14 at 01:31
When doing ordinary integration over $\Bbb R$, we are writing $f(g(x))$ as $f(u)$ and $g'(x)dx=du$. This "$dx$" and $"du"$ might have more abstract generalized meanings, but $u$-substitution is really just the chain rule and the fundamental theorem of calculus. We know how to find anti-derivatives of things of the form $f(g(x))g'(x)$, because we know how to find derivatives of things of the form $f(g(x))$. You might call into question the notation, but the mathematics behind it is rock solid. — PVAL-inactive, Feb 06 '14 at 01:43
I don't feel qualified to give a full answer, but what's going on is some deep theorems with strong hypotheses, involving pushforward measures for Lebesgue integrals, or more simply a differentiable change of variables if you're just talking about Riemann integrals. The differential notation was constructed in some sense in order to have the "fraction" cancelling property, which is why it's "OK" to think of them like that if all you care about is evaluating an integral. In truth, though, (1) is just an abuse of notation. — Joshua Pepper, Apr 03 '14 at 09:25
@JoshuaPepper If you don't feel qualified to give a full answer, could you perhaps suggest a book I could look up? Would it just be in Rudin?...(I forget if his "Principles..." looks at Riemann integrals, but I think it does?) — user1729, Apr 03 '14 at 09:31
@JoshuaPepper Also, any suggestions for better tags would be appreciated... — user1729, Apr 03 '14 at 09:45
I think that actually you will get a better answer here on SE than in any book, as this question is really about common conventions/usage of notation - what you need is an explanation of the hierarchy of concepts (I'm a bit rusty on the details, which is why I'm hesitant to rush into anything). From a pure mathematician's perspective, the cancelling property of $\frac{dx}{dy}$ and the $du=f^\prime(x) dx$ notation are just ways of expressing two theorems - the chain rule and integration by substitution. I might write up a post making the link a bit more explicit. — Joshua Pepper, Apr 03 '14 at 09:47
There's no need for any fancy or weird or deep concepts here. It's easy to phrase these things in a way that is perfectly rigorous, that does not involve manipulating $du$ or $dx$ individually. You basically just use the chain rule in reverse, to find an antiderivative. — littleO, Apr 03 '14 at 09:49
Related answer I've made regarding derivatives, but involving a book which answers your question using Non-standard analysis. I'll try to make an answer ASAP. — JMCF125, Apr 03 '14 at 11:36

score 13 · Answer 1 · answered Apr 03 '14 at 10:09

13

Consider evaluating $\int (3x^2 + 2x) e^{x^3 + x^2} \, dx$ (as in this Khan Academy video).

Often teachers will say, let $u = x^3 + x^2$, and note that "$du = (3x^2 + 2x) dx$". Therefore, they say, \begin{align} \int (3x^2 + 2x) e^{x^3 + x^2} \, dx &= \int e^u du \\ &= e^u + C \\ &= e^{x^3 + x^2} + C. \end{align}

However, this explanation is confusing because there's no such thing as $du$ or $dx$.

A more clear (in my opinion) and perfectly rigorous explanation is just to notice that our integral has the form $\int f(g(x)) g'(x) dx$, and use the rule \begin{equation} \int f(g(x)) g'(x) dx = F(g(x)) + C \end{equation} where $F$ is an antiderivative of $f$. This rule is clearly true, because it's nothing more than the chain rule in reverse. There's no need to use any "infinitesimals" or anything.

answered Apr 03 '14 at 10:09

littleO

51,938

1

+1 Right, I actually use the second way you describe, its completely clear and not confusing. – Sawarnik Apr 03 '14 at 10:17
2

+1 This is pretty much what I have to say. You could also note that $\int f(g(x))g^\prime(x) dx = \int f(g(x))\frac{dg}{dx}dx = \int f(g)dg$ is where the $dg=\frac{dg}{dx}dx$ notation comes from, and could be considered the definition of the latter. – Joshua Pepper Apr 03 '14 at 10:40
Yes, this explanation is nice. The unconfortable fact remains, though, that "what the teachers say" is what we (all?) end doing. – leonbloy Apr 03 '14 at 15:48
@littleO I agree that it is more clear let go of the differentials completely when doing u-substitution. However, for simple integrals that may work, while u-substitution gives you a tool to tackle more involved cases. And then $dg = \frac{dg}{dx}dx$ comes in which bugs me a lot. Any thoughts to that? – d56 Nov 25 '20 at 12:01
How's the last equation 'clearly true'? I'm probably missing something obvious, can somebody please expand on that a bit? – Aadi Prasad Nov 01 '21 at 13:49
1

@AadiPrasad If you take the derivative of $F(g(x))$ using the chain rule, you get $F’(g(x)) g’(x) = f(g(x)) g’(x)$. Does that clarify it? – littleO Nov 01 '21 at 16:00
@littleO Yes! Thanks a lot – Aadi Prasad Nov 02 '21 at 03:01

score 7 · Answer 2 · answered Feb 06 '14 at 02:20

Recall that $u$-substitution is really the inverse rule of the chain rule, just like integration by parts is the inverse rule of the product rule. The essence of the chain rule is that

$$ \frac{\mathrm{d}y}{\mathrm{d}x} = \frac{\mathrm{d}y}{\mathrm{d}u}\frac{\mathrm{d}u}{\mathrm{d}x},$$

which is why we like to write derivatives as ratios - often, when they look like they cancel, they really "do cancel," so to speak.

A better way of writing $u$-substitution is to say that $\dfrac{\mathrm{d}u}{\mathrm{d}x} = f'(x)$, though we might as well notate this as $u'(x)$, since that's what we're really doing. Then

$$ \int g(u(x))u'(x) \mathrm{d}x = \color{#F01C2C}{\int g(u(x)) \frac{\mathrm{d}u}{\mathrm{d}x}\mathrm{d}x = \int g(u) \mathrm{d}u} = \int g(u) \mathrm{d}u,$$

where I've notated the important equality in red. The step in red is visibly related to the chain rule: the part that looks like it cancels really does cancel. $\diamondsuit$

The theme here is that this is valid because of the chain rule, and the notation is chosen to support the cancellation effects. The fact that people go around separating this very convenient notation is largely for different reasons, and/or because they are implying a good amount of knowledge of "differentials."

We can even more directly relate this to the chain rule by giving a proof. Consider the function

$$ F(x) = \int_{0}^x g(t)\mathrm{d}t.$$

Consider the function $F(u(x))$ and differentiate it:

$$ \begin{align} F(u(x))' &= F'(u(x)) u'(x) = \frac{\mathrm{d}F}{\mathrm{d}u}\frac{\mathrm{d}u}{\mathrm{d}x}\\ &=\frac{\mathrm{d}}{\mathrm{d}u}\int_{0}^{u(x)} g(u(t))\mathrm{d}t \cdot u'(x)\\ &= g(u(x))u'(x). \end{align}$$

The the second fundamental theorem of calculus says that

$$\begin{align} \int_a^b g(u(x))u'(x)\mathrm{d}x &= F(u(b)) - F(u(a)) \\ &= \int_{a}^{b} g(u(t))u'(t)\mathrm{d}t \\ &=\int_{a}^{b}g(u(t))\frac{\mathrm{d}u}{\mathrm{d}t}\mathrm{d}t. \end{align}$$

Of course, we also know that $\displaystyle F(u(b)) - F(u(a)) = \int_{u(a)}^{u(b)} g(t) \mathrm{d}t = \int_{u(a)}^{u(b)} g(u) \mathrm{d}u$.

Why is the date of this answer Feb 6 when the question was asked 8 days ago? ಠ_ಠ — Superbus, Apr 12 '14 at 01:09
@Lucius: Magic ;p No, actually this answer was merged into this question from an earlier version. — davidlowryduda, Apr 12 '14 at 01:13
This color is so cute: #F01C2C that I'm gonna steal from you! — PPP, May 24 '14 at 04:31

fgp · Answer 3 · 2014-04-06T12:16:21.373

One way to interpret $df$ (for $f \,:\, \mathbb{R} \to \mathbb{R}$ for simplicity)is to view it as a map $$ df \,:\, \mathbb{R}\to \left(\mathbb{R} \to \mathbb{R}\right) \,:\, c \mapsto \left(x \mapsto xF_c\right) \text{.} $$ In plain english, $df$ is map which maps each point in $\mathbb{R}$ to a linear function $\mathbb{R} \to \mathbb{R}$. For each $c$, the linear map $(df)(c) = x \mapsto xF_c$ is the best linear approximation of $f$ at point $c$. We know, of course, that this means nothing other than that $F_c = f'(c)$ - after all, that's one way to define the derivative - as the slope of the best linear approximation at point $c$.

So what is $\frac{du}{dv}$, then? It's a quotient of maps, and if you interpret it simply point-wise, you get $$ \frac{du}{dv} = \frac{(c,x) \mapsto xU_c}{(c,x) \mapsto xV_c} = (c,x) \mapsto \frac{xU_c}{xV_c} = (c,x) \mapsto \frac{U_c}{V_c} \text{.} $$ This doesn't depend on $x$ anymore, so we may re-interpret it as a function $\mathbb{R} \to \mathbb{R}$, and if $u=u(v)$ and $v$ is an independent variable, then $U_c = u'(c)$ and $V_c = 1$, so we get $\frac{du}{dv} \,:\, \mathbb{R} \to \mathbb{R} \,:\, c \mapsto u'(c)$, i.e. $\frac{du}{dv} = u'$.

By the way, usually when we notate a function value, we use \mapsto instead of \to, as in: the function $f:A\to B$ defined by $x\mapsto x^2$, or for a function of functions: $g:A\to(B\to C)$ defined as $g=x\mapsto(y\mapsto x\cdot y)$. — Mario Carneiro, Apr 03 '14 at 17:16

S.C. · Answer 4 · 2023-06-24T07:44:18.480

I would like to add a bit of commentary to LittleO's post. As someone who just recently learned the 'formal' approach to single variable integration (Spivak's Calculus), I would first tell all beginners to familiarize themselves with the implicit, shorthand notations that riddle discussions of integration: e.g. $\int_a^b f \iff \int_a^b f(x)dx \iff \int_a^b f(t)dt$

Suppose we are interested in finding the antiderivative/primitive of a function $h$...which is what is implicitly being asked when tasked to compute "$\int h$"...note the absence of explicit limits of integration...i.e. $\int_a^b h(x) dx$ is semantically distinct from $\int h$.

It may be the case that $h$ is a complicated function, so it is not immediately clear how we can find an $H$ such that $H'=h$. However, there is a trick that can occasionally work, which is derived from the following theorem:

If $f$ and $g'$ are continuous functions, then$\displaystyle \int_{g(a)}^{g(b)}f(u)du=\int_a^b[f\circ g](x)\cdot g'(x)dx $

The reason this comes in handy is that if it turns out that we can find an $f$ and $g$ such that $h=f \circ g \cdot g'$...AND if it turns out that $\int f$ belongs to a collection of 'easy-to-compute' / already-known-primitives, then we can apply the above theorem to deduce what $\int h$ is. Specifically, because we know that $\int f = F$, using the aforementioned theorem and the Fundamental Theorem of Calculus, we will have that $\int h = \int f \circ g \cdot g'=F\circ g \quad (\dagger)$, as LittleO specified...people will sometimes include the arbitrary constant $C$, but you do not have to.

So how exactly does $u$-substitution come into play here? The purpose of $u$-substitution is to help us deduce what exactly $f$ and $g$ are so that we can apply $(\dagger)$. From my experience, there are effectively two different flavors of the $u$-substitution approach, one of which is more straightforward than the other.

Method 1

This is the most straightforward $u$-substitution approach and it can be used to tackle primitive searches such as $\displaystyle \int\sin^3(x)\cos(x)dx$. For this integral, someone's work will likely look like this:

\begin{align}u&=\sin(x) \\ du&=\cos(x)dx \\\int u^3du &=\frac{u^4}{4}\\ \text{plugging} &\text{ in $\sin(x)$ for $u$}\\ \int\sin^3(x)\cos(x)dx&=\frac{\sin^4(x)}{4} \end{align}

So what exactly is going on here? Firstly, several of the equality symbols used here are extremely misleading. Such equality symbols do not really mean equality (at least not in the usual sense). I prefer to write the following:

\begin{align}u&\overset{*}=\sin(x) \\ du&\overset{*}=\cos(x)dx \\\int u^3du &=\frac{u^4}{4}\\ \text{plugging} &\text{ in $\sin(x)$ for $u$}\\ \int\sin^3(x)\cos(x)dx&=\frac{\sin^4(x)}{4} \end{align}

You will notice that I used a new symbol $\overset{*}=$. I am defining this symbol to mean "Replace the text on the right with the text on the left". In the context of our integral $\displaystyle \int\sin^3(x)\cos(x)dx$, these two replacement operations (one for $\cos dx$ and one for $\sin$) yield the integral $\displaystyle \int u^3du$.

So why does this work, exactly? Well, starting from $\displaystyle \int u^3 du$, let's work backwards and perform the following (remember that our symbol $\overset{*}=$ has a particular order for the text swap): $g(x)\overset{*}= u$ and $g'(x)dx\overset{*}=du$. Then we have $\displaystyle \int g(x)^3 g'(x) dx$. This looks an awful lot like our desired form $\displaystyle \int [f \circ g](x) \cdot g'(x) dx$. In fact, you should realize that this structures reveals what our $f$ function is: $f(\star)=(\star)^3$...i.e. $[f\circ g](x) \cdot g'(x)=f(g(x))\cdot g'(x)=g(x)^3 g'(x)$. Now that we have our function $f$, and it happens to be easy to integrate, we apply our $(\dagger)$ to get the appropriate answer.

Method 2

The next method can be used to evaluate integrals such as $\displaystyle \int \sqrt{1-x^2}dx$. Recalling some standard trig formulas (double angle theorem and such), someone's work will most likely look like:

\begin{align} x&=\sin(u) \\dx &= \cos(u)du \\ \int\sqrt{1-\sin^2(u)}\cos(u)du=\int \cos^2(u)du&=\int \frac{1+\cos(2u)}{2}=\frac{1}{2}u+\frac{\sin(2u)}{4}=\frac{1}{2}u+\frac{\sin(u)\cos(u)}{2}\\\text{plugging in } &\arcsin(x) \text{ for $u$} \\\int \sqrt{1-x^2}dx&=\frac{\arcsin(x)}{2}+\frac{x\sqrt{1-x^2}}{2}\end{align}

The structure is seemingly different from Method 1, where we now are involving functions of $u$ (rather than functions of $x$, as we did with Method 1). So why does this work at all?

First, consider the following argument.

Suppose $h(x)=[f \circ g](x) \cdot g'(x)$ and we know that $g$ has an inverse $g^{-1}$. There is a calculus theorem that tells us that $[g^{-1}]'(u)=\frac{1}{g'(g^{-1}(u))}$. We will keep this in our back pocket for the moment. Next, consider the composition $h(g^{-1}(u))$. Then we have that: \begin{align}h(g^{-1}(u))&=[f \circ g](g^{-1}(u))\cdot g'(g^{-1}(u)) \\&=f(u)\cdot g'(g^{-1}(u)) \end{align}

Next, let us multiply both sides by $[g^{-1}]'(u)$ and apply our pocketed theorem:

\begin{align}h(g^{-1}(u))\cdot [g^{-1}]'(u)&=f(u)\cdot g'(g^{-1}(u)) \cdot[g^{-1}]'(u)\\&= f(u)\cdot g'(g^{-1}(u)) \cdot \frac{1}{g'(g^{-1}(u))} \\&=f(u)\end{align}

Although this derivation may seem unrelated, note that we have figured out a way of manipulating $h$ such that the function $f$ is revealed to us. But this derivation is precisely what our $u$-substitution strategy is attempting to mimic (although in a shorthand manner)!

If we define $g=\arcsin$, then $g^{-1}=\sin$ (albeit, with a restricted domain). Letting $h(x)=\sqrt{1-x^2}$, we then composed $h$ with $g^{-1}$ when we wrote "$x=\sin(u)$"...i.e. $h(\sin(u))=\sqrt{1-\sin^2(u)}$. What did we do next? Well, when we wrote "$du=\cos(x)dx$", all we really did was multiply $\sqrt{1-\sin^2(u)}$ by $\cos(u)$, which is the derivative of $\sin(u)$. i.e. we multiplied $h(g^{-1}(u))$ by $[g^{-1}]'$, which gives us the function $\sqrt{1-\sin^2(u)}\cdot \cos(u)$. From our above derivation, we thus conclude that $f(u)=\sqrt{1-\sin^2(u)}\cdot \cos(u)$. So, really, once again, all the $u$-substitution approach is doing is revealing the function $f$ to us (which is, hopefully, something that is comparatively easier to integrate). In accordance with $(\dagger)$, after integrating this $f$, we just need to plug in $g$ as its argument to get the appropriate answer.

score 0 · Answer 5 · answered Apr 03 '14 at 09:20

When applying notions like chain rule and substitution we treat derivatives just like fractions, but the rules are slighly bent, since for multi variable chain rule:

if $\frac{\partial f(g(t),h(t))}{\partial t}= \frac{\partial f}{\partial g}\frac{\partial g}{\partial t}+\frac{\partial f}{\partial h}\frac{\partial h}{\partial t}$, but if we cancel these down we get $\frac{\partial f(g(t),h(t))}{\partial t}=2\frac{\partial f(g(t),h(t))}{\partial t}$.

But in one variable just like above, everything runs smoothly, and it is goodd to note the things like "$dx$" are infinitesimely small changes in x, so when we consider $du/dx$, we consider both "$du$" and "$dx$" as they become infinitesimely small, so we can manipulate them like fractions.

I know that the notation represents the limit of some fractions, but I suppose my question is "why can we manipulate this limit like it was a fraction?!?" (or rather, the "notation representing this limit...") — user1729, Apr 03 '14 at 09:44
We can because when we consider these objects, they are changing, becoming smaller and smaller, so they act like non zero objects, allowing us to manipulate them like fractions. But like I said there are "paradoxical" examples. — Ellya, Apr 03 '14 at 10:06

score 0 · Answer 6 · edited Apr 03 '14 at 09:46

0

Consider the geometrical interpretation you have a right square with lengths $\Delta x$ and $\Delta u$ and $f'$ is actually $f'=k=\tan(\alpha)$ so you get $f'=k=\tan(\alpha)=\frac{\Delta u}{\Delta x}$. Now let $\Delta x \rightarrow 0$ and you get a definition of derivation... So du and dx have a meaning and doing something like $dx=\frac{du}{f'(x)}$ does have sense.

edited Apr 03 '14 at 09:46

user1729

31,015

answered Apr 03 '14 at 09:36

MarkisaB

809

I know that the notation represents the limit of some fractions, but I suppose my question is "why can we manipulate this limit like it was a fraction?!?" (or rather, the "notation representing this limit...") – user1729 Apr 03 '14 at 09:44

score 0 · Answer 7 · edited Apr 13 '17 at 12:20

I know that they are not fractions [...]

Well, by Non-standard analysis (following a book I referred on a comment about a similar answer), that's where you're wrong. And that is the premise supporting the whole question, if you said the opposite, you wouldn't've made this question.

My question is: what on earth is going on at line (1)?!?

First, the $u$-substitution, while used in integration, is on its own an operation of differentiation. As differentiation is a function (on functions), and both sides are equal, the differentials must be equal. It is by definition of any function that $a=b\Rightarrow f(a)=f(b)$.

So, what is differentiation? It is the infinitesimal variation of the tangent of a function on a point. What you might be thinking is: why make a distinction between the variation on the tangent and on the function itself if they're coincident when zooming in enough? The answer is this allows us to both define the derivative as a (hyper)real fraction (pun intended), and not simply and informally discard the smaller infinitesimals.

An accurate image to illustrate this is the following taken from the book:

As the differential is on the tangent, and to know the tangent one should know the derivative, the former's definition is $dy = f'(x)\ dx$. Note everything here are numbers, and by the transfer principal, usual rules of algebra apply.

The formalism behind integration by substitution

7 Answers7

Linked

Related