Motivation of the Gaussian Integral

Question

I read on Wikipedia that Laplace was the first to evaluate

$$\int\nolimits_{-\infty}^\infty e^{-x^2} \, \mathrm dx$$

Does anybody know what he was doing that lead him to that integral? Even better, can someone pose a natural problem that would lead to this integral?

Edit: Many of the answers make a connection to the normal distribution, but then the question now becomes: Where does the density function of the normal distribution come from? Mike Spivey's answer is in the spirit of what I am looking for: an explanation that a calculus student might understand.

That integral is the distribution function for a normal random variable with mean 0 and standard deviation 1. — gary, Sep 02 '11 at 21:29
You can show that this integrates to 1 by setting I=$\int e^{-x^2}$, then showing that $I^2$ integrates to 1 by using a second integral $\int e^{-y^2}$ — gary, Sep 02 '11 at 21:38
@gary: I'm not sure how the integral, which is a number could also be a distribution function. Even so, the integrand is only a density up to a constant, and the standard deviation of the corresponding normal random variable is certainly not one. :) — cardinal, Sep 02 '11 at 22:30
Well, yes, I just don't know how to tex the limits of the integral in order to define a function whose argument is one of the endpoints of integration. And, yes, if standard deviation is 1, then we need a different constant C to turn it into a distribution function. — gary, Sep 02 '11 at 22:40
@gary, one cannot modify the variance by multiplying the density by a constant. As cardinal mentioned, there is only one probability density proportional to $e^{-x^2}$ and neither its variance nor its standard deviation are $1$. Hence your first comment here (the one upvoted twice) is wrong. The second as well since $I^2$ is not $1$ either. — Did, Sep 02 '11 at 22:59
@Didier: I argued the opposite; that , depending on the choice of variance, a different choice of C is needed, i.e., for the standard normal there is a given scaling factor C, and for other normals, different scaling constants C's would be needed for the integral (considering the limits of integration) to be a normal distribution. — gary, Sep 02 '11 at 23:16
@gary: Well, it is difficult to know what you call $C$ since you do not say, but you seem to think that one could choose $C$ at will and get a probability density $Ce^{-x^2}$. So let me state this: There is one and only one $C$ such that $Ce^{-x^2}$ is a probability density on the real line. — Did, Sep 02 '11 at 23:22
No, Didier; I am trying to say that , once the mean$\mu$ and the standard deviation sigma are given, we can find a constant C so that:$C\int e^{\frac{(x-\mu)^2}{2(sigma^2)}}$ from $(-\infty, \infty)$ integrates to 1. I tried to give a broad, general statement and not a rigorous one, since I thought the ref. to a normal was the main point, and that those that wanted more rigor would go to, e.g., Wikipedia. I would like to think that otherwise, the entry is "spiritually correct". — gary, Sep 02 '11 at 23:34
Yes, I made a mistake in being careless in trying to give rigor, but I think the idea is the right one. — gary, Sep 02 '11 at 23:35
@gary, OK. But the assertions that variance=1 or that I=1 are wrong, even spiritually... :-) — Did, Sep 02 '11 at 23:38

score 19 · Accepted Answer · edited Jun 12 '20 at 10:38

You asked about a natural problem that leads to this integral. Here's a summary of the argument I give in my undergraduate probability theory class. (It's due to Dan Teague; he has the article here.)

Imagine throwing a dart at the origin in the plane. You're aiming at the origin, but there is some variability in your throws. The following assumptions perhaps seem reasonable.

Errors do not depend on the orientation of the coordinate system.
Errors in perpendicular directions are independent. (Being too high doesn't affect the probability of being too far to the right.)
Large errors are less likely than small errors.

Let the probability of landing in a thin vertical strip from $x$ to $\Delta x$ be $p(x) \Delta x$. Similarly, let the probability of landing in a short horizontal strip from $y$ to $\Delta y$ be $p(y) \Delta y$. So the probability of the dart landing in the intersection of the two strips is $p(x) p(y) \Delta x \Delta y$. Since the orientation doesn't matter, any similar region $r$ units away from the origin has the same probability, and so we could express this in polar as $p(r) \Delta x \Delta y$; i.e., $p(r) = p(x) p(y)$.

Differentiating both sides of $p(r) = p(x) p(y)$ with respect to $\theta$ yields $0 = p(x) \frac{dp(y)}{d \theta} + p(y) \frac{dp(x)}{d \theta}$. Using $x = r \cos \theta$, $y = r \sin \theta$, simplifying, and separating variables produces the differential equation $$\frac{p'(x)}{x p(x)} = \frac{p'(y)}{y p(y)}.$$

Now, we assumed that $x$ and $y$ are independent, yet this differential equation holds for any $x$ and $y$. This is only possible if, for some constant $C$, $$\frac{p'(x)}{x p(x)} = \frac{p'(y)}{y p(y)} = C.$$ Solving the $x$ version of this differential equation yields $$\frac{dp}{p} = Cx \, dx \Rightarrow \ln p = \frac{Cx^2}{2} + c \Rightarrow p(x) = Ae^{Cx^2/2}.$$ Finally, since large errors are less likely than small errors $C$ must be negative. So we have $$p(x) = A e^{-kx^2/2}.$$ Since $p(x)$ is a probability density function, $$\int_{-\infty}^{\infty} A e^{-kx^2/2} dx = 1,$$ which is just a scaled version of your original integral.

(A little more work shows that $A = \sqrt{k/2\pi}$. Also, if you think about it some, it makes sense that $k$ should be inversely related to the variability in your throwing. And for the normal pdf, we do in fact have $k = 1/\sigma^2$.)

+1. Note that large errors HAVE to be less likely than small ones, otherwise C is nonnegative and your p cannot integrate to 1 (in other words, this is not a hypothesis but a consequence of your proof). — Did, Sep 02 '11 at 23:32
This is an interesting answer. However, it is not clear to me why $p(x)\Delta x$ is the probability of landing in the vertical strip of size $\Delta x$. What is the justification for this? — echoone, Sep 03 '11 at 00:09
@echoone: We assume that errors in the vertical and horizontal directions are independent, and so we can treat the horizontal distance separately from the vertical distance. For the small horizontal range in 1D from $x$ to $x + \Delta x$, the probability of falling in that range is $p(x) \Delta x$ (this is a standard interpretation of probability density functions). Adding the second dimension, and with the independence from $y$, that horizontal strip becomes a thin vertical strip of width $\Delta x$. If that doesn't help you can also look at the graph in Dan Teague's article linked above. — Mike Spivey, Sep 03 '11 at 00:28
You might be interested in this fairly recent, and fairly light, paper: R. J. Tibshirani, A. Price and J. Taylor (2011), A statistician plays darts. J. Royal Stat. Soc.: Series A, 174: 213–226. An even lighter version of this appeared in Significance. — cardinal, Sep 03 '11 at 04:45
The axioms say that for an integrable function $d$ (the probability density) on the line, $d(|x|)d(|y|)=d(\sqrt{x^2+y^2})$ and this can only happen when $\log |d(\sqrt{|x|})|$ is linear. — zyx, Jul 12 '13 at 15:42

Michael Hardy · Answer 2 · 2011-09-03T17:12:04.397

In the early 18th century, Abraham de Moivre wrote a book on probability called The Doctrine of Chances. He wrote in English because he had fled to England to escape the persecution of Protestants in France. He considered the probability distribution of the number of heads that appear when a fair coin is tossed $n$ times. The exact value of the probability that number is $x$ is takes a while to compute. The mean is $\mu=n/2$ and the standard deviation is $\sigma= \sqrt{n}/2$. Consider $$ \varphi(x) = (\text{some normalizing constant}) \cdot e^{-x^2/2},\text{ and }\Phi(x) = \int_{-\infty}^x \varphi(u)\,du. $$ where the constant is chosen so that $\varphi$ integrates to 1. de Moivre found the normalizing constant numerically, and later his friend James Stirling showed that it is $1/\sqrt{2\pi}$, which I think was stated in a later edition of de Moivre's book.

de Moivre showed that the cumulative probability distribution of the number of heads, evaluated at $x$, approaches $F(x)=\Phi((x-\mu)/\sigma)$ as $n$ grows. This was an early version of the central limit theorem. The probability the the number of heads is exactly $x$ is approximated by $F(x+1/2) - F(x-1/2)$.

That is a reason to consider this function.

In the 19th century, Carl Gauss showed that least-squares estimates of regression coefficients coincide with maximum-likelihood estimates precisely if the cumulative probability distribution function (c.d.f.) of the errors is $x\mapsto\Phi(x/\sigma)$ for some $\sigma>0$. Apparently that's how the name "Gaussian" got attached to these functions.

James Clerk Maxwell showed that if $X_1,\dots,X_n$ are independent identically distributed random variables and their probability distribution is spherically symmetric in Euclidean space, then again, the c.d.f. must be that same "Gaussian" function.

score 2 · Answer 3 · answered Sep 02 '11 at 23:17

The function $e^{-x^2}$ is natural for investigation for lots of different reasons. One reason is that, depending on your normalization, it is essentially a fixed point of the Fourier Transform. That is, $$ \int_{\mathbb{R}^n} e^{-\pi x^2} e^{-2\pi ix\cdot t}\mathrm{d}x=e^{-\pi t^2} $$ Another reason is tied to the Central Limit Theorem. Suppose that $f$ satisifies $\int_{\mathbb{R}^n}f(x)\;\mathrm{d}x=1$, $\int_{\mathbb{R}^n}x\;f(x)\;\mathrm{d}x=0$, and $\int_{\mathbb{R}^n}|x|^2\;f(x)\;\mathrm{d}x=1$ (these can be attained by translating and scaling the domain and scaling the range of $f$). Let $f^{\;*k}$ be the convolution of $f$ with itself $k$ times. Then $k^{n/2}f^{\;*k}(x\sqrt{k})\to \frac{1}{\sqrt{2\pi}^n}e^{-x^2/2}$ as $k\to\infty$.

Qiaochu Yuan · Answer 4 · 2011-09-02T23:30:16.150

2

My guess (and it is only a guess) is that Laplace was motivated by applications to the heat equation. As it turns out, a scaled Gaussian describes how heat propagates from a point in Euclidean space, so you can describe how heat propagates from an arbitrary initial distribution by adding up a bunch of Gaussians. Of course, an arbitrary initial distribution may be continuous, and then the sum turns into a convolution.

If you expect that the distribution of heat propagating from a point is proportional to a Gaussian (which I guess you can motivate by heuristically applying the central limit theorem), then the proportionality constant is a Gaussian integral, which you now need to know the value of.

edited Sep 02 '11 at 23:30

answered Sep 02 '11 at 23:22

Qiaochu Yuan

419,620

I think this relates to what I said, that $e^{-x^2}$ is a fixed point of the Fourier Transform (up to possible scaling in the domain and range), and that it is a convolution limit (ala Central Limit Theorem). Most of the motivations so far have some link to the Fourier Transform. – robjohn Sep 02 '11 at 23:29
Hmm. I think I might be wrong. It looks from the Wikipedia article like Laplace was motivated by the central limit theorem. – Qiaochu Yuan Sep 02 '11 at 23:32
I think it was Glaisher and the other guys at Cambridge who were motivated by the heat equation in studying the error function, not Laplace. See my answer here – J. M. ain't a mathematician Sep 03 '11 at 01:50
4

Laplace treated the function $\int_x^\infty e^{-t^2} \mathrm{d}t$ in two main papers that I am aware of. (1) Pierre-Simon Laplace (1782). Memoire sur les approximations des formules qui sont fonctions de tres grands nombres. Memoires de mathematique et de physique tires des registres de l’Academie Royales des Sciences, pages 1–88, and (2) Pierre-Simon Laplace (1805). Traite de Mecanique Celeste, volume 4. Courcier, Paris, pp. 254ff. I have digital copies of both; I'll try to reread the latter and recall his focus in that section. – cardinal Sep 03 '11 at 04:33

score 1 · Answer 5 · answered Sep 02 '11 at 21:30

1

The function $e^{-x^2}$ is proportional to the probability density of the normal distribution (with mean 0, variance $1/2$). So you want to find the constant $C$ to make this a probability density:

$$\int_{-\infty}^{\infty}\!Ce^{-x^2}\,dx=1$$

(since you want the total probability to be 1).

I suspect that's the historic reason as well.

answered Sep 02 '11 at 21:30

minimalrho

703

Uunless the C has some factors, I think the variance/std. dev should be equal to 1. – gary Sep 02 '11 at 21:38
3

@gary: ?? $ $ $ $ – Did Sep 02 '11 at 22:17
Yes, Didier, my bad. I will delete it soon, and I will edit my answer. – gary Sep 02 '11 at 22:41

gary · Answer 6 · 2011-09-12T00:14:54.147

1

The integral you gave, when taken as a definite integral :

$\int^{x_2}_{x_1} e^{-x^2} dx$

When scaled by $\frac {1}{\pi^{0.5}}$

is/describes the univariate probability density of a normally-distributed trandom variable $X$ with mean=0 and standard deviation ${\frac {1}{2^{0.5}}}$, i.e. This means that the numerical value of this integral gives you the probability of the event: $x_1 \leq X\leq x_2$

When this integral is scaled by the right factor $K$ it describes a family of normal distributions with mean $\mu$ and standard deviation $\sigma$

You can show it integrates to that constant K (so that when you divide by $K$ , the value of the integral is $1$, which is what makes it into a density function) by using this trick (used for the case mean=0)

Set I=$C\int e^{-x^2}$ , then consider $\int e^{-y^2}$ , and then compute their product as (using the fact that $x^2$ is a constant when considered as a function of y, and viceversa for x ):

$I^2$=$\int e^{-x^2+y^2}dxdy$ , using a polar change of variable: $x^2+y^2=r^2$ (and, of course, a change of the regions of integration.)

The integral is based on non-mathematical assumptions too:

http://www.stat.tamu.edu/~genton/2007.AG.Bernoulli.pdf

edited Sep 12 '11 at 00:14

answered Sep 02 '11 at 21:52

gary

4,027

1

Sorry but $(2\pi)^{1/2}$ is wrong. – Did Sep 02 '11 at 22:19
Didier: I think this correction does it; please let me know otherwise. – gary Sep 02 '11 at 22:45
To be sure, you don't mean that the indefinite integral describes the probability distribution function, right? Perhaps you can put the limits to make everything clear? – Srivatsan Sep 02 '11 at 22:50
No, of course not; the distribution is a function of the limits of integration, as with all distribution functions. – gary Sep 02 '11 at 23:07
Srivatsan: given that I am either wrong in a lot of things today, or I'm being misuderstood--or misunderestimated--let me leave it at that for now; sorry, I am tired of replying to comments; sorry. – gary Sep 02 '11 at 23:43
O.K: here is my revision. Would you comment? – gary Sep 12 '11 at 00:18

Venkata Karthik Bandaru · Answer 7 · 2023-12-04T07:07:10.407

[This is similar to Mike Spivey’s answer above]

Let random variables ${ X, Y }$ be i.i.d with density ${ f }.$ We will pick ${ f }$ so that normalised random vector ${ \left( \frac{X}{\sqrt{X ^2 + Y ^2}}, \frac{Y}{\sqrt{X ^2 + Y ^2}} \right) }$ is uniform over the unit circle.

The density of ${ (X,Y) }$ at ${ (x,y) }$ is ${ f _{X,Y} (x,y) = f(x)f(y) }.$

Pick a segment on the circle from angle ${ \theta _0 }$ to ${ \theta _0 + \Delta \theta }.$ Probability that random vector ${ \left( \frac{X}{\sqrt{X ^2 + Y ^2}}, \frac{Y}{\sqrt{X ^2 + Y ^2}} \right) }$ falls in this segment is the probability that ${ (X,Y) }$ falls in the infinite sector from angle ${ \theta _0 }$ to ${ \theta _0 + \Delta \theta }.$

Since ${ \left( \frac{X}{\sqrt{X ^2 + Y ^2}}, \frac{Y}{\sqrt{X ^2 + Y ^2}} \right) }$ is uniform over the unit circle, this probability is $${ \frac{\Delta \theta}{2\pi} = \int _{\theta _0} ^{\theta _0 + \Delta \theta} \int _{0} ^{\infty} f(r \cos \theta) f(r \sin \theta) \, r \, dr \, d\theta }$$that is $${ \frac{\Delta \theta}{2\pi} \approx \Delta \theta \int _{0} ^{\infty} f(r\cos \theta _0) f(r \sin \theta _0) \, r \, dr .}$$The integral ${ \int _{0} ^{\infty} f(r \cos \theta _0) f(r \sin \theta _0) \, r \, dr }$ is independent of ${ \theta _0 }.$ This will happen if ${ f(r \cos \theta) f(r \sin \theta) }$ is independent of ${ \theta }.$ So setting ${ \frac{d}{d\theta} f(r \cos\theta) f(r \sin \theta) = 0 }$ gives $${f ^{’} (r\cos \theta) (-r\sin \theta) f(r \sin \theta) + f(r\cos\theta) f ^{‘} (r\sin \theta) (r \cos \theta) = 0 }$$ that is $${ f ^{‘} (x) (-y) f(y) + f(x) f ^{‘} (y) x = 0 }$$ that is $${ \frac{1}{x} \frac{f ^{‘} (x)}{f(x)} = \frac{1}{y} \frac{f ^{‘} (y)}{f(y)}. }$$Since this holds at all ${ (x,y), }$ we can set it to a constant ${ \frac{1}{x} \frac{f ^{‘} (x)}{f(x)} = \frac{1}{y} \frac{f ^{‘} (y)}{f(y)} = C }.$ Now integration gives $${ f(x) = A e ^{C \frac{x ^2}{2}} }$$as needed. (Since ${ f }$ is a density, ${ C }$ must be negative).

The gaussian also arises as the approximate density of normalised sums, as here and here.

Motivation of the Gaussian Integral

7 Answers7

Linked