3

Imagine I generate $N$ real numbers with a uniform distribution between $0$ and $1$. I sort them in ascending order. And I calculate the differences between each consecutive pair.

For example, for $N = 3$, it would be like this:
enter image description here

I would like to know what is the expected value of that differences, $\Delta$. Each pair will have a different $\Delta$ but I'm just interested on the average expected value of all $\Delta$.

As I don't know how to calculate it with equations I've done it with a simulation instead (I'm not mathematician nor statistician, I just work with computers). And what I've gotten is: if I have $N$ numbers the average distance between them is $\frac1{1+N}$, and that's also the value between the first number and zero.

I would like to know how to calculate this with equations. Intuitively I think it's the same as calculating $E\left[|X_i-X_j|\right]$ where $X_i$ and $X_j$ are two neighboring numbers in that sample.

In general the expected value is calculated as: $$E[X]=\int_{-\infty}^\infty xf(x)\,dx$$

I think here we should integrate $|X_i-X_j|$ but I don't know $f(x)$, the distribution of the differences, because I can't assume they are independent because we have to sort them and take the nearest pairs. And the absolute value complicates calculations a little bit more.

There is an apparently similar question here but they are speaking about the minimum distance among all pairs.

StubbornAtom
  • 17,052
skan
  • 391
  • It seems you need the distribition of the difference; data distribution is given, can't you take the derivative of the data distribution as the required distribution? – Creator Jan 27 '20 at 22:22
  • Are you thinking about relating the increment with the derivative? But that will only work when N -> ∞. And in my example I'm speaking about a small N. – skan Jan 27 '20 at 23:31

3 Answers3

2

Here's a somewhat more roundabout way of obtaining the result, assuming the originally chosen numbers $\ Y_1, Y_2, \dots, Y_N\ $ are independent.

The arithmetic mean difference between the ordered numbers is $\ \Delta=\frac{\sum_\limits{i=1}^{N-1} \left(X_{i+1}-X_i\right)}{N-1}=\frac{X_N-X_1}{N-1}\ $, and the joint distribution of $\ X_1, X_N\ $ can be calculated from \begin{align} P\left(a\le X_1, X_N\le b\right)&=P\left(a\le Y_1,Y_2,\dots,Y_N\le b\right)\\ &=\cases{\left(\min(b,1)-\max(a,0)\right)^N& if $\ b>\max(a,0) $\\ 0& otherwise} \end{align} and \begin{align} P\left(X_N\le b\right)&=P\left(Y_1,Y_2,\dots,Y_N\le b\right)\\ &=\cases{\min(b,1)^N&if $\ b>0$\\ 0& otherwise} \end{align} since \begin{align} P \left(X_1\le a, X_N\le b\right)&= P\left(X_N\le b\right)-P\left(a\le X_1, X_N\le b\right)\\ &=\cases{\min(b,1)^N-\left(\min(b,1)-\max(a,0)\right)^N & if $\ b>\max(a,0) $\\ 0&otherwise} \end{align} The joint density function $\ f(x,y)\ $ of $\ X_1,X_N\ $ is therefore given by \begin{align} f(x,y)&=\cases{N(N-1)\left(\min(y,1)-\max(x,0)\right)^{N-2}& if $\ y>\max(x,0)$\\ 0& otherwise} \end{align} and the expectation $\ E(\Delta)\ $ of $\ \Delta\ $ by \begin{align} E(\Delta)&=\int_0^1\int_x^1\frac{y-x}{N-1}\cdot N(N-1)(y-x)^{N-2}dydx\\ &= N\int_0^1\int_x^1(y-x)^{N-1}dydx\\ &=\int_0^1(1-x)^Ndx\\ &= \frac{1}{N+1} \end{align}

lonza leggiera
  • 28,646
  • 2
  • 12
  • 33
  • Why are you using two variables, X and Y? How are they related? – skan Jan 28 '20 at 17:31
  • What confused me is you started speaking about Y – skan Jan 30 '20 at 00:55
  • 1
    Oh, I'm sorry. $\ Y_1, Y_2, \dots, Y_N\ $ are just the uniformly distributed random numbers originally chosen before they were reordered to get $\ X_1, X_2,\dots, X_N\ $. For the derivation to work, the $\ Y$s have to be assumed independent, even though the $\ X$s won't be. In fact, the result won't necessarily be true if the $\ Y$s aren't independent. – lonza leggiera Jan 30 '20 at 01:03
2

Since there are $N+1$ subintervals and their lengths add to $1$, the average subinterval length is $\frac{1}{N+1}$.

paw88789
  • 40,402
1

It can be proven that the expected value of the $k$-th smallest number is $\frac{k}{n+1}$ (it has a $B(k,n+1-k)$ distribution). By linearity of expectation we have: $$\mathbb{E}[X_{i+1}-X_i]=\frac{i+1}{n+1}-\frac{i}{n+1}=\frac{1}{n+1}$$ We can give a simple proof of the assertion at the beginning as follows: imagine that we sample an additional point, let's call it $X$, from the same distribution independently of all the others. The expected value in question is equal to the probability that this point will be smaller than $k$-th smallest number not counting $X$ i.e. will be on position $1$, $2$, ..., $k$ when $X$ is counted. But since there are $n+1$ points and each position of $X$ is equally likely this probability is simply $\frac{k}{n+1}$ as expected.

Bartek
  • 2,475