Rigorous treatment of integration by parts in a Calculus 1 course

Question

I will be teaching Calculus 1 soon and I am trying to find some justifications for fishy arguments that are widespread out there.

In a standard Calculus 1 course, the following concepts are presented to students.

Antiderivative: A function $F$ is called an antiderivative of a function $f$ in an interval if $F'=f$ in that interval.

Indefinite integral: the family of all the antiderivatives of a function $f$ is called indefinite integral of $f$ and is denoted by $\int f(x)dx$. Having shown that the difference of any two antiderivatives of the same function is constant, if $F$ is an antiderivative of $f$, then we write $\int f(x)dx=F(x)+C$, where $C$ is a constant.

The problem I see is that some textbooks define the differential in a very vague manner and then foster the use of the equality $dy=y'dx$ without justification.

For example, when presenting the integration by parts all starts fine with the product rule of two differentiable functions $u$ and $v$: $$(uv)'=u'v+uv'\implies uv'=(uv)'-u'v$$ which implies that $$\int u(x)v'(x)dx=u(x)v(x)-\int u'(x)v(x)dx\quad\quad\quad (A)$$

The problem starts with the manipulation of the dummy symbols in the notation of the indefinite integral by the substitutions $dv=v'(x)dx$ and $du=u'(x)dx$ resulting in the popular formula: $$\int udv=uv-\int vdu\quad\quad\quad (B)$$

When I look at the definition of indefinite integral, equality (A) is well-defined but (B) is not.

Into practice: Calculate $\int 2x\cos(x)dx$.

A student using (A) will write: let $u(x)=2x$ and $v'(x)=\cos(x)$. Then $u'(x)=2$ and $v(x)=\int cos(x)dx=\sin(x)$ (here undertanding that we just need 1 (any) antiderivative)

Then by (A) we have: $\int 2x\cos(x)dx=2x\sin(x)-\int 2\sin(x)dx=2x\sin(x)+2\cos(x)+C$.

When using (B) students use $u=2x$ and $dv=\cos(x)dx$. Then compute $du=2dx$ and $v=\sin(x)$, and finally replace the pieces into (B) as if they were TeX processors. I mean, the method relies on the syntax of (B), not in the definition of indefinite integral.

Question: what is the mathematical justification to accept the use of (B)? The justification should be at the level of students taking Calculus 1.

Remark: Note that substitutions of the type $dy=y'dx$ are not necessary for the substitution techniques of integration in a Calculus 1 course.

Indeed, if $F'=f$, then the chain rule shows: $$(F\circ g)'(x)=f(g(x))g'(x)$$ so by the definition of indefinite integral $$\int f(g(x))g'(x)dx=F(g(x))+C,$$ or equivalently, $$\int f(g(x))g'(x)dx=\left.\int f(u)du\right|_{u=g(x)}.$$

Update: Thanks to the answers posted, I realized that my concern was justified: (B) is (apparently) only justified after considering contents that are not part of a calculus 1 course, say, through Stieltjes integrals or differentials. Thank you for the well-presented answers and for the comments and resources presented in the comment sections.

I am well aware that it would not be good to hide (B) from my students since as it was pointed out in the comments, students will face it sooner or later and they should be prepared for it. That is why I posted this question. I think I will present and mostly use (A) during the course. I will mention (B) stating that is true but we do not have the tools to prove it and that for now it can be used as a notation-wise shortcut for (A), so they have a way to justify steps that appear in many calculus textbooks, steps that are layout without a proper justification (and you wonder why people do not understand mathematics).

I have never seen it done like you show in $(B)$, and I come from a physics background. Why do you need justification to accept it? I don't think I would, for precisely the reasons you mention. — Vercassivelaunos, Jan 23 '22 at 19:48
When I was taught these things the first time ever, I was told that $du$ is just a shortcut for $u'(x)dx$ and, similarly, $dv$ is a shortcut for $v'(x)dx$, so the formula (B) is just a shorter way to write the formula (A). We did not dwell upon that, and have proceeded to solve examples, knowing that, whatever we did, we could rewrite it in the "(A)" form had we wanted to do so. — , Jan 23 '22 at 19:56
Here is a short set of notes that may be useful. https://ee.usc.edu/stochastic-nets/docs/calculus-review.pdf — Michael, Jan 23 '22 at 20:15
I find (A) misleading. It is a formula for indefinite integrals, but you dropped all the $C$s even though they can end up being different when evaluating LHS and RHS separately. Also, in a definite integral, where these formulas actually come to life, you will need two of the terms in the middle, but only one of either integral. The reason for these complications is that your formulas (A) and (B) are really properties of derivatives expressed in terms of antiderivatives. As for (B) and the identification of $du$ and $u'(x)dx$, that's perfectly justified for monotonous differentiable functions. — tobi_s, Jan 24 '22 at 05:47
@tobi_s Your concerns are misguided. Since the integral symbol is explicit, the antidifferentiation constants are not written: they are only written after evaluation. That is how that usage of the symbol is intended to work in a calculus 1 course curriculum. As for your last sentence, you missed the point of the question. OP made it clear that justifications for the identification exist, but what they are asking for are justifications for the identification at the level of a calculus 1 student, and that definitely does not exist.... (continued) — Angel, Jan 24 '22 at 13:04
@tobi_s Outside of measure theory and differential geometry, such justifications do not exist, and neither of these is at the level of a calculus 1 student, since a calculus is a prerequisite for studying the content/material for those. Also, your point about expressing a property of derivatives in terms of antiderivatives is moot. Anything that is expressible in terms of one can be expressed in terms of the other. This is the point of solving differential equations, really, and a formal-logical equivalence exists between the two. — Angel, Jan 24 '22 at 13:05
"the family of all the antiderivatives of a function f is called indefinite integral of f" That's not quite true. Discontinuous functions can have indefinite integrals without having antiderivatives. "replace the pieces into (B) as they were TeX processors." What does this mean? Do you mean "as if they were TeX processors? — Acccumulation, Jan 25 '22 at 00:29
@Angel I'm not a representative sample, but I remember stumbling about this specific formula and messing up the insertion of the limits until I understood what it actually means. I think it is more confusing than beneficial. I also don't think it meets the expectation for antiderivative formulas where the usual form is $\int f(x) dx = g(x) + C$ with $g(x)$ not containing integrals. In fact, (A) is a totally different type of equation. In that sense (B) is actually the "better" equation as it expresses the functional content with no placeholder $x$ that means different things on LHS and RHS. — tobi_s, Jan 25 '22 at 06:53
@tobi_s $(A)$ is of the type $$\int{f(x)},\mathrm{d}x=\int{g(x)},\mathrm{d}x+h(x).$$ That is all. Your last sentence is unclear. What do you mean by the "placeholder $x$ that means different things on LHS and RHS"? It means exactly the same thing on both sides, so I have no clue as to what you are talking about. — Angel, Jan 25 '22 at 12:52
@Acccumulation I guess you are not using the definition of indefinite integral I defined in the post. Thanks for the grammar correction. I have fixed it! — Chilote, Jan 25 '22 at 13:57
@Angel Let's go back to the usual form $\int f(x) dx=F(x) + C$. This means either (1): $F'(x)=f(x)$ or (2): $\int_a^b f(x)dx = F(b) - F(a)$. In case (1) the $x$ is just a notational complication with no meaning. But note in case (2) how $x$ still appears on the LHS but not the RHS, so again it can't be the same thing. Now try to apply either (1) or (2) to form A from above. You suddenly have to do different things to the terms on the RHS, because the $x$ mean different things within the same side of the equation. Functional relations should be expressed in terms of functions. — tobi_s, Jan 27 '22 at 00:31
@tobi_s I have no idea of what you are trying to say. As stated, the equation is of the type $$\int{f(x)},\mathrm{d}x=\int{g(x)},\mathrm{d}x+h(x),$$ which simply means that $$h'(x)=f(x)-g(x).$$ There is no "doing different things on both sides." It is all just shorthand for what we are already familiar with. — Angel, Jan 27 '22 at 13:05
Can someone move this to chat? The website is not giving me the option to do that. — Angel, Jan 27 '22 at 13:06
@Angel If you can't see that the different $x$s behave differently, then I don't think that I could explain it to you in a chat. Note how much simpler the manipulation rules for $\int! f = h + \int !g \iff f = h' + g$ are -- because there are no terms that are just meaningless or (IMO) confused notation. (BTW I would still prefer a $+C$ in the form with integrals, but that's a slightly different issue.) — tobi_s, Jan 27 '22 at 13:39
@tobi_s I remained unconvinced, as the equation you just used is literally equivalent to the one I presented. — Angel, Jan 27 '22 at 16:05
@Angel: of course it's equivalent. (Not "literally" equivalent, though.) My point was that the A) way (i.e. yours) is confusing. But look, I'm not going to convince you, you already said that you don't understand what I'm trying to say, and your latest post convinced me that that assessment was correct. Also, you are not the OP, you're just someone who jumped in here to tell me I'm misguided, so I will stop here. — tobi_s, Jan 28 '22 at 07:28

score 15 · Answer 1 · edited Jan 27 '22 at 13:23

I want to say that I strongly disagree with the view presented by OP and further I think that we are doing a disservice to students by hiding the approach B from them.

First we have the notation $$\frac{dy}{dx}=y^{\prime}$$ I think you do not disagree with this notation, even though it is not really a fraction. We can then write the expression as, $$dy=y^{\prime}dx$$ and this is just equivalent notation. Or even better to write it as $$dy=\frac{dy}{dx}dx.$$ I would present this to the students as simply notation. In this sense equation B is the same as equation A in an alternative notation.

The main point however is that expression B is much easier, especially for students to remember and use in calculation. Further, it presents significant simplifications in calculation. I find working with students that once you can get them to accept this $dy$ notation, and this may require a little practice, they make rapid progress in applications of integration. Indeed many are confused by the standard A approach that is given them and it hampers their progress. For example, this is how I write the integration of $\int x^2 \cos x dx$, $$\int x^2\cos x dx=\int x^2d(\sin x)$$ $$=x^2\sin x-\int \sin x d(x^2)$$ $$=x^2\sin x-\int 2x\sin x dx$$ $$=x^2\sin x+\int 2x d(\cos x)$$ $$=x^2\sin x+2x\cos x-\int 2\cos x dx$$ $$=x^2\sin x+2x\cos x- 2\sin x$$

This is very streamlined and it makes difficult problems easier to solve.

There is also another issue, how will you treat substitution? Will you not not write, $x=f(u)$
and so $$dx=f^{\prime}(u)du$$ Thus you will have to use the $dy$ notation on any case, so why not harmonize the two methods of integration? There is also a fact that you will have to reconcile with, students will hopefully continue in mathematics, and they will encounter the other notion, thus they should be prepared for it.

As to the question of rigor there are two options: the Stieltjes integral as noted in the previous answer (which is a good idea to indicate), or the idea of differential forms, too advanced but it is rigorous.

For me I would use the notation of the B form almost exclusively. And with the following proof. $$(uv)^{\prime}=uv^{\prime}+vu^{\prime}$$ $$uv=\int uv^{\prime}dx+\int vu^{\prime}dy$$ $$uv=\int udv+\int vdu$$ And then I would provide many exercises to promote facility with manipulation of this notation. But, of course, people have been arguing about how to teach calculus for decades.

Comments are not for extended discussion; this conversation has been moved to chat. — Xander Henderson, Jan 24 '22 at 19:12
@Chilote First off, I entirely agree with Rene's answer (+1). One other thing to keep in mind is that, unless you teach someplace where the only degree being offered is in math, many of your students may not take another more advanced course to fully formalize differentials later. You would do those students a great disservice by not covering it at this point at all. — dxiv, Jan 24 '22 at 19:18
@dxiv If the students are never going to be exposed to differentials formally, then there is really no reason to expose them to the concept informally at all in the first place, since it is not a concept they will encounter in applications. — Angel, Jan 25 '22 at 12:54
@Angel Not a concept they will encounter in applications ? Look at books on differential equations, they are full of differentials, read a book on thermodynamics, more differentials. They are everywhere. — Rene Schipperus, Jan 25 '22 at 20:01
@Angel I beg to differ. There are entire branches of applied math, engineering etc that make heavy use of this "shorthand notation", but don't require (and therefore students won't learn or care about) more advanced analysis or formal differentials. My point was that the choice is not between "teach this now vs. later, at a better time math-wise", but rather "teach this now, or leave some students never learning it". A course like Calculus 1 cannot not be built upon the assumption that everybody will take another math course in the future to catch up. — dxiv, Jan 26 '22 at 07:21
@dxiv As someone who studies applied maths and physics for a living, I think you overestimate the importance of the shorthand notation for these fields. In many instances, in these applications, we even omit the differentials altogether, making that shorthand pointless. — Angel, Jan 26 '22 at 12:46

score 7 · Answer 2 · answered Jan 23 '22 at 20:15

(B) is also well defined whenever the integrals are Stieltjes integrals, ie., limits of sums of the form $$ S_1=\sum_{i=1}^nu(\xi_i)(v(x_i)-v(x_{i-1}))\,,\quad\quad S_2=\sum_{i=1}^nv(\eta_i)(u(x_i)-u(x_{i-1}))\,. $$ The Stieltjes integral does not depend on the choice of $\xi_i,\eta_i\in[x_{i-1},x_i]\,.$ Therefore we can choose $\xi_i=x_i$ and $\eta_i=x_{i-1}\,.$ Then $$ S_1+S_2=\sum_{i=1}^nu(x_i)v(x_i)-v(x_{i-1})u(x_{i-1})=u(x)v(x)-u(0)v(0)\,. $$ Taking the limit we have shown $$ \int_0^xu\,dv+\int_0^xv\,du=u(x)v(x)-u(0)v(0)\,. $$ The interval $[0,x]$ is obviously arbitrary, therefore $\int u\,dv+\int v\,du=uv\,.$

peek-a-boo · Answer 3 · 2022-01-23T21:17:06.183

Here's my honest suggestion: just avoid (B) all together if you want to be super rigorous but don't want to go too deep down the rabbit hole; otherwise, just present both ways of writing down things and emphasize that both are talking about undoing the product rule. Because at this stage people don't even define the symbol $d(\text{anything})$ carefully, so there's no point trying to make rigorous sense out of it, or any theorems which follow from it. Often people treat the symbol $dx$ and $\int(\cdots)\,dx$ as merely symbol pairs "go together", like a "." at the end of a sentence.

Anyway, if you want to make things careful, we must first start with a careful definition of $d$ and its effect on functions.

Definition $1$.

Let $I\subset\Bbb{R}$ be an interval (say non-empty and open for simplicity), and let $F:I\to\Bbb{R}$ be a differentiable function. $dF$ shall mean the mapping $I\to \text{Hom}(\Bbb{R},\Bbb{R})$ such that for each $a\in I$, $dF_a:\Bbb{R}\to\Bbb{R}$ is the linear transformation defined as $dF_a(h):= F'(a)\cdot h$.

The symbol $x$ will now denote the inclusion function $x:I\to \Bbb{R}$, defined by setting for each $a\in I$, $x(a):= a$. This is clearly also a differentiable function on $I$, so according to the above definition, we can consider the object $dx$. One can then prove by unwinding definitions that $dF=F'\,dx$, meaning that for all $a\in I$ and all $h\in \Bbb{R}$, $dF_a(h)=F'(a)(dx)_a(h)=F'(a)\cdot h$.

This definition of $d$ shouldn't be introduced with the sole intention of making "indefinite integration rigorous". Rather, the idea of $d$ should be introduced when teaching differential calculus, because it really drives home the idea of local linear approximations (which is really the essence of differential calculus); maybe you'd like to read this answer for a few extra comments.

Definition $2$.

A differential $1$-form on an interval $I\subset\Bbb{R}$ is a mapping $\omega:I\to\text{Hom}(\Bbb{R},\Bbb{R})$. So, for each $a\in I$, we have a linear transformation $\omega_a:\Bbb{R}\to\Bbb{R}$.

Now, to each $\omega$, we can define a corresponding function $f:I\to\Bbb{R}$ as $f(a):=\omega_a(1)$. Thus, based on the definition of $dx$, you can write $\omega=f\,dx$. So, every $1$-form $\omega$ can be written as $\omega=f\,dx$ for some unique $f$, and conversely given any function $f:I\to\Bbb{R}$, we get a corresponding $1$-form $f\,dx$.

So, we now have the vocabulary of a differential 1-form, and of the differential/exterior derivative of a differentiable function $F:I\to\Bbb{R}$. The object $dF$ is thus an example of a differential 1-form.

You can prove that $d$ obeys some nice rules:

$d(F+G)=dF+dG$
$d(FG)=(dF)\cdot G + F\cdot (dG)$.

The problem of indefinite integration can thus be described in this language as follows:

Antiderivative/Primitive Let $\omega$ be a differential $1$-form on an interval $I$. We say $\omega$ has an anti-derivative/ primitive on $I$ if there is a differentiable function $F:I\to\Bbb{R}$ such that $dF=\omega$.

The set of all primitives of $\omega$ shall be denoted $\int \omega$, i.e $\int\omega=\{F\,| \text{$F:I\to\Bbb{R}$ is differentiable and $dF=\omega$}\}$. Now, we define the sum of such sets to be the set of all possible sums of individual functions. Define scalar multiplication similarly. Then, you can prove things like

for any differentiable function $F:I\to\Bbb{R}$, $\int dF=\{F+c\,:\, c\in\Bbb{R}\}$ (this requires the mean-value theorem and the fact that $I$ is connected).
If $\omega,\eta$ are differential $1$-forms on $I$ which have primitives, then so does $\omega+\eta$ and in this case, $\int (\omega+\eta)=\int\omega+\int \eta$.
If $\omega$ is a differential $1$-form on $I$ which has a primitive then for any $c\in \Bbb{R}$, so does $c\omega$. In this case, $\int c\omega=c\int\omega$.

So, now you can interpret integration by parts for $1$-forms as the reverse of the product rule: if $u,v:I\to\Bbb{R}$ are differentiable functions then $\int u\,dv= \int[d(uv)-v\,du]=\int d(uv)-\int v\,du= \{uv+C\,:\, C\in\Bbb{R}\}-\int v\,du$, or by slight abuse of notation, just $uv-\int v\,du$.

Remarks.

The above is a relatively self-consistent presentation of $d$, and itnegration by parts and so on.

Is the justification at the level of Calc 1? Well, one could definitely introduce the definitions in a calc 1 course (assuming the students have seen the concept of a function as a "rule" between two sets, not just between subsets of reals). So in this regard I would say yes, it is at the level of a calc 1 student.

The more important question however is whether one should introduce these definitions at the level of calc 1? Here I would very strongly say NO. One should not introduce these definitions, unless you have a group of very theoretically-minded and curious students. Why do I say this? Well, the concept of a differential form is undoubtedly very important in higher math, but when the domain is an interval $I\subset\Bbb{R}$, this makes things needlessly complicated, and mainly because linear algebra in one-dimension is very trivial: if $V$ is a vector space over a field $\Bbb{F}$, then we have a canonical isomorphism $\text{Hom}(\Bbb{F},V)\cong V$, the isomorphism being "evaluation at $1$". Above, we have $\Bbb{F}=V=\Bbb{R}$. Because of this fact, it suffices to deal only with functions between subsets of $\Bbb{R}$ (i.e just with functions $F$ and their derivatives $F'$) without dealing with more complicated target spaces (and hence with differential forms $\omega$ and $dF$).

Just to be clear, it is not that integration is trivial; rather differential calculus becomes much simpler because we're only dealing with an open subset $I\subset\Bbb{R}$.

Also, every textbook aimed at this level only introduces functions and their derivatives, so it's best to stick with that, and just emphasize that integration by parts is the product rule with a slightly rearrangement: $uv'=(uv)'-vu'$.

Well I obviously disagree with you entirely, an undue stress on rigor in a calc. 1 course will alienate the students. Also, and people forget this entirely in learning modern mathematics, you cant understand the solution without understanding the problem. — Rene Schipperus, Jan 23 '22 at 21:07
@ReneSchipperus I've read your answer, and I don't think we're actually in too much disagreement (see my remarks at the end). I completely agree that we shouldn't present things the way I have written them to every calc 1 student. It's too many layers of definitions just to make a few calculations completely rigorous. I only mentioned it because OP asked "what is the mathematical justification for the use of (B)", so I presented just one of the possible mathematically rigorous reasons. — peek-a-boo, Jan 23 '22 at 21:09
@ReneSchipperus for the purposes of pedagogy, I think a healthy balance of both ways of writing things $\int u(v)v'(x),dx=u(x)v(v)-\int u'(x)v(x),dx$ and $\int u,dv=uv-\int v,du$ will be beneficial; and also the idea that for differentiable functions $df=f',dx$, be it heuristic motivation or whatever. Students should then be left to decide which way of thinking they like better (I of course prefer things the way I wrote them, but I of course understand that not everyone is like me, and that many have different ways of learning the same concept). — peek-a-boo, Jan 23 '22 at 21:12
Its something calculus teacher argue about. I think students dont like an emphasis on rigor, they see it as pedantic and useless. We should still present a rigorous version of the calculus, but at the same time not go overboard. Also we must take care to prepare them for latter ideas. — Rene Schipperus, Jan 23 '22 at 21:16
Besides are we now going to present rigorous proofs of Gauss, Green, and Stokes ? I am for that. — Rene Schipperus, Jan 23 '22 at 21:19
@ReneSchipperus it all of course depends on who we're teaching it to. anyway, the purpose of this answer of mine is just to provide OP with one possible way of "rigorizing things". It is completely up to them whether they wish to adopt this or not (though imo they shouldn't teach this stuff to 1st year students because they're just not ready yet, but anyway, that's something for OP to decide). as for personal opinions, yes I especially enjoyed my multivariable calculus course following Spivak and the proof of Stokes; but of course mileage varies. — peek-a-boo, Jan 23 '22 at 21:22
I think some kind of definition of a differential algebra as a one dimensional module over the differentiable functions with generator $dx$ and rules like $dy=y^{\prime} dx$ might be possible and not that difficult at a calc level. There is such an idea in algebra, over say $k(x_1, \ldots , x_n)$ so maybe a simplification of this would make everyone happy. — Rene Schipperus, Jan 23 '22 at 21:34
@ReneSchipperus An appropriate level of rigor is necessary in every mathematics course, calculus is no exception. Nothing about it is alienating, as long as the teacher does it correctly. I agree with you we should not go overboard, but no one here is suggesting we should make it more rigorous than necessary. OP is not even suggesting we make it as rigorous as it would be in a real analysis course. You said that students do not like rigor, as they find it pedantic and useless. But pedantry is an attitude, and is not inherent to the rigor itself. — Angel, Jan 24 '22 at 16:51
@ReneSchipperus And in my experience, I know of many students of calculus, current or former, who actually disliked the lack of rigor in calculus classrooms, and believe that the lack of rigor and the amount of handwaving led to unnecessary confusion about concepts that later on become trivial to them. Gauss, Green, and Stokes are not theorems encountered in a calculus 1 course, so this is completely irrelevant to the discussion: a red herring, if you will. — Angel, Jan 24 '22 at 16:53
@ReneSchipperus Finally, you mentioned a rigorous method for introducing differentials, in terms of modules over differentiable functions. But calculus 1 students are less equipped to understand the concept of a module, which requires an intermediate-level background in linear algebra and abstract algebra, than they are to understand differential forms, though that is not to say that they are sufficiently prepared for either. Abstract algebra is taught to students who already took calculus, at least in most universities I know. — Angel, Jan 24 '22 at 16:56
@ReneSchipperus Preparing students for future concepts is different than throwing at them notational conventions and concepts they lack the prerequisites for. — Angel, Jan 24 '22 at 16:57
Can someone move this to chat as well? Thank you and sorry for the inconvenience. — Angel, Jan 25 '22 at 12:56

score 2 · Answer 4 · answered Jan 27 '22 at 00:19

2

I offer that the main purpose of beginning courses in calculus is to empower science and engineering students (not math majors) with needed tools to do their work. And that work is very far removed from mathematical rigor.

Universities have a series of courses called "Advanced Calculus" in which mathematical rigor is emphasized. And then a series called "Real Analysis" that goes even deeper.

answered Jan 27 '22 at 00:19

richard1941

950

1

Per Feynman, a physicist: “The poor mathematician has no guide but precise mathematical rigor and care in the argument” – Quanto Jan 27 '22 at 13:53
And mathematical rigor is a moving target. What was rigorous 200 years ago is laughable today. – richard1941 Jan 28 '22 at 18:58

Rigorous treatment of integration by parts in a Calculus 1 course

4 Answers4

Linked