With regards to measure theory, why do we use different differentials in the integrand?

Question

I remember my lecturer telling me about the importance of measure theory once, where she asked why do we use differentials such as $dx, dF, dP$ and $d\mu$ in an integrand - all different depending on what we want. I was told to look up why, and now that I am self-teaching myself measure theory I am very much interested why. There is a very good answer to the case of using $dx$, but why is it that we use different differentials, and how do we know which to use when computing integrals?

Note: I have so far only learnt about $dx$ and the basics of integrating with respect to a measure $\mu$. I think I am right in saying $dP$ and $dF$ are used for distributions. Are there other examples you may also know of that may be interesting as well?

Note that if you know how to integrate with respect to measures, and so are used to writing $d \mu$, then you already know how to integrate with distributions, writing $dP$ and $dF$, because distributions are measures. — aduh, Aug 24 '16 at 16:46

Wavelet · Accepted Answer · 2016-08-24T21:30:22.247

Just for one example, integrating w.r.t. different measures allows us to define any linear functional on the space of continuous, compactly supported functions via the Riesz-Markov-Kakutani Representation Theorem. For example, take the point evaluation functional $J_{a}:C_{c}(X) \to \mathbb{R}$ where $X$ is some compact set, given by $$ J_{a}[f]=f(a) $$ but we can write this as an integral with respect to the Dirac measure which is given by $$ \delta_{a}(A)= \begin{cases} 1 & \text{ if } a \in A \\ 0 & \text{ if } a \notin A \end{cases} $$ such that $$ J_{a}[f]=\int_{A} f \, d\delta_{a}=f(a) $$ there are some technical considerations (concerning regularity and that the measure must be locally finite) here but the main idea is that any continuous linear functional on this space can be given by integration w.r.t. a unique Borel measure.

Also, when you say that integrating with respect to a probability distribution $\mathbb{P}$, you are somewhat on the right track. Given a probability space $(\Omega, \mathcal{F},\mathbb{P})$, a measurable space $(\mathbb{R},\mathcal{B}(\mathbb{R}))$ and a random variable $X:\Omega \to \mathbb{R}$, a probability distribution is a measure $$ \mathbb{P}^{*}(X(\omega) \in A)=\int_{\Omega} \mathbb{1}_{X^{-1}(A)}\, d\mathbb{P}=\int_{A} f \, d\mu $$ where $\mu$ is the Lebesgue measure and then $f$ is the density of $\mathbb{P}^{*}$ w.r.t. $\mu$ (i.e. $f$ is the Radon-Nikodym derivative of $\mathbb{P}^{*}$ w.r.t. $\mu$.)

Also to answer your concern about computation, we almost always use the Radon-Nikodym theorem to write a measure in terms of the Lebesgue measure, at least when possible. For example, it would not be possible to write the point evaluation functional as an integral w.r.t. the Lebesgue measure because the Dirac measure and the Lebesgue measure are mutually singular. Basically, for any measure $\nu$ that is absolutely continuous w.r.t. the Lebesgue measure $\mu$, we can write it as $$ \nu(A)=\int_{A} f \, d\mu $$

With regards to measure theory, why do we use different differentials in the integrand?

1 Answers1