What I’m about to present is an answer, from only one perspective (which is the version I tell myself, but keep in mind there are several other ways of answering your question), taking hindsight into account, not Lebesgue’s original motivations.
The ”Riemann integral is bad with Limits” route:
One of the common responses to this is that Riemann integrals cannot handle functions like $\chi_{\Bbb{Q}\cap[0,1]}$ (the function defined on $[0,1]$ which is $1$ on the rationals and $0$ otherwise). And sure, this function is not Riemann-integrable, but I wouldn’t consider this alone as a true failure. So what if we can’t handle a pathological case, maybe it’s just so bad it doesn’t deserve to be handled.
I’d say the more significant reason is that various function spaces equipped with integral norms are not complete. For instance, the space $C([a,b])$ (similar issue with $C^k([a,b])$) of continuous real-valued functions on $[a,b]$ equipped with say the inner product $\langle f,g\rangle=\int_a^bfg$ does not define a complete space. Likewise, the larger space of Riemann-integrable functions $\mathcal{R}([a,b])$ is not complete for this inner product. This was a huge issue in the study of PDEs, particularly the Laplace’s/Poisson’s equation; if you’re interested, you should read up on the history of the Dirichlet problem to see the various ups and downs (Dirichlet introduced the idea, Riemann used it in complex analysis, then I believe it was Weierstrass who pointed out a flaw in the argument which essentially boils down to lack of completeness, and then in the 20th century this was finally fully handled with I think Hilbert, and later on with Sobolev spaces etc).
The above historical illustration with PDEs is just an example. Another historical example where there used to be lots of handwaving was Fourier analysis (which I don’t want to get into now). Anyway, the general message is that completeness, and so taking one step further, good behavior under limiting processes, is EXTREMELY important in analysis. So much of higher level analysis explicitly relies on it. Even at the very basic level, we have recognized that the rational numbers are insufficient, and that’s why we came up with the concept of real numbers. If our very number system is expected to be complete, why not our fancy (actually not really fancy… they’re very naturally occurring) function spaces?
Ok, so at this stage, one might say that we could just consider the abstract completion of these spaces with respect to appropriate norms, but that is unsatisfactory because the completion is abstract, and not realized as a set of functions. Furthermore, it doesn’t fully address our concerns with the Riemann integral. The Riemann integral plays very poorly with pointwise convergence as the resulting function usually fails to be Riemann integrable. So, in this sense, widening the definition to Lebesgue integrals remedies this failure. So, with Lebesgue integrals, we get (relatively easily, thanks to the works of giants):
- The famous convergence theorems: monotone convergence, Fatou’s lemma, dominated convergence.
- Various complete function spaces, $L^p(\Bbb{R}^n)$ for $1\leq p\leq \infty$, and going a step further, the Sobolev spaces $W^{k,p}(\Bbb{R}^n)$ (an essential collection of spaces when studying PDEs)
- Lebesgue’s differentiation theorem, and hence the Lebesgue version of the two Fundamental theorems of Calculus (vast generalizations of those in the Riemann setting)
The first two bullet points address the previously mentioned defecits in the Riemann integral. The third says that the Lebesgue integral (with respect to the Lebesgue measure) is a strict generalization of the Riemann integral. So, anything Riemann’s integral can do, Lebesgue can do that and more. This is just the overview of the ‘main benefits’ of the Lebesgue integral with respect to Lebesgue measure, when we come at things from the perspective of ‘Riemann integral is bad with limits’.
The ”what is volume?” route:
Now, having arrived at the concept of the Lebesgue integral and measure, one can say hey the concept of Lebesgue measure can obviously be studied from a more abstract perspective. So, rather than focusing so much on the integral (a linear map which assigns to each suitable function, a number), one focuses on the measure itself. This is also a very natural starting point of investigation. This gives us measure theory. We just take some basic primitive notions as our starting point (a set, a sigma algebra so we can conveniently play around with the sets, and a measure so we can assign numbers to these sets).
This untethers us from the real line or $\Bbb{R}^n$, and allows us to ask about volumes in a more general context. And soon, we can realize that the concept of ‘measure’, while it may have initially been motivated as a generalization of ‘volume’, it is far from limited to that interpretation. Very natural things like mass/charge density all behave like (signed) measures. So, many natural things which people have always been interested in can be studied more formally, precisely, systematically and generally once measure theory is introduced. The entirety of modern probability is built on this foundation: sample spaces, probability measures etc.
Summary: Pros and the one Con of Lebesgue Integrals.
So, we now have a trifecta:
- measures: a vastly generalized notion of volume, and can be applied in a whole bunch of situations which previously we had to handle in an adhoc manner.
- Lebesgue integral with respect to arbitrary measures: this gives us the three famous convergence theorems
- Lebesgue function spaces defined using the Lebesgue integral: we now have complete spaces. So this first of all gives us a vast collection of examples of Banach spaces, and secondly, it means that many practical problems one might face in analysis can be reformulated in the Banach space context, where we can appeal to the spectacularly powerful theorems of functional analysis.
And, contrary to what many people might say, the concept behind Lebesgue and Riemann integrals are nearly identical. For Lebesgue, you take your simple functions, define their integral in the obvious manner, and then by taking an appropriate limit, define it for a much broader class of functions. For Riemann/Darboux integrals: partition your interval, consider upper/lower/Riemann sums, and then take a suitable limit (supremum of upper sums or infimum of lower sums). If one talks about Riemann integrals rigorously, then one necessarily has to introduce $\epsilon,\delta,\sup,\inf$ etc so in that regard the Riemann integral isn’t really that easy either.
The main technical difficulty with Lebesgue integrals on the real line is that because we’re allowing for a much larger collection of sets (Lebesgue/Borel measurable sets, which arise from a $\sigma$-algebra) we have a lot more stuff to deal with. As such, one of the main technical difficulty there lies in proving the existence of the Lebesgue measure (or more generally, it is in Caratheodory’s extension theorem). It is this bit of technicality (and also its vast generality) that (very understandably so) prevents us from introducing Lebesgue integrals first.
The Lebesgue integral (and the accompanying measure theory) is better than the Riemann integral in every mathematical regard; the only thing holding it back is that student’s usually aren’t ready for it yet. One has to understand that as a first year calculus student, they are not yet mature enough to appreciate the need for general measures, and completeness (in fact it is only after a full year of my calculus course using Spivak that I started to get glimpses of the importance of limits and completeness). This difficulty is compounded immensely by the fact that some students don’t even understand Riemann integrals (a thing whose sole purpose is to ‘add many things’): they conflate the Riemann integral with the FTC which facilitates its computation, thinking that “$\int_a^bf(x)\,dx$ is defined as $F(b)-F(a)$ where $F’(x)=f(x)$”.
Edit: Another Reason
Another reason why Lebesgue’s integral is good is because of the Radon-Nikodym theorem which allows us to nicely/conveniently apply the general principle in functional analysis that one should study not just a given topological vector space (e.g Frechet or Banach space) but also its dual. Rather than me talking further, here’s part of the introduction to Dieudonne’s Treatise on Analysis, Vol II, Chapter 13 (Integration):
Nowadays the purposes of a theory of integration are very different from what they were at the beginning of this century. If the aim was only to be able to integrate “very discontinuous” functions, integration would hardly have gone beyond the rather narrow confines of the “fine” theory of functions of one or more real variables. The reasons for the importance that
Lebesgue’s concept of integral has acquired in modern analysis are of quite a different nature. One is that it leads naturally to the consideration of various new complete function spaces, which can be conveniently handled precisely because they are spaces of functions (or of classes of “equivalent” functions) and not just abstract objects, as is usually the case when one constructs the
completion of a space. Another is that the theorem of Lebesgue-Nikodym and the properties of measures defined by densities $(13.15)$ give us a method for dealing with denumerable families of measures on a locally compact space, by fixing a basic measure and working with the densities relative to this basic measure (hence again with functions); this again proves to be extremely
convenient. Here the modern point of view emerges: given a $\mu$-integrable function $f$, what is important is not the values taken by $f$ so much as the way in which $f$ operates on the space of bounded continuous functions by means of the linear mapping $g\mapsto \int fg\,d\mu$ (this mapping depends only on the equivalence class of $f$ and therefore does not change when we modify $f$ at the points of a set of measure zero). The development of this point of view will lead in Chapter XVII to the theory of distributions, which is a natural generalization of the notion of measure on differential manifolds.
Perhaps one final remark I can make, regarding the last sentence, is that Dieudonne (as do Bourbaki) defines (rather than proving, as many other texts do) a measure to be an element of the dual of the space of bounded continuous functions; this is why in the last sentence he says distributions (which live in the dual of the space of smooth compactly supported functions) are natural generalizations of measures on smooth manifolds.