3

Let $(U_{(1)},\dots,U_{(n)})$ be the first order statistic of $n$ i.i.d. uniform random variables in $(0,1)$. Let $(E_i)_i$ be $n$ i.i.d. exponential random variables of parameter 1, also independent from everything else.

Is it possible to write a formula for $\mathbb{P}(E_1 U_{(i)}\geq E_i U_{(1)}, \forall i\geq2)$? Is there a nice formula at least when $n$ goes to infinity?

RobPratt
  • 45,619
foubw
  • 1,054
  • 6
  • 20
  • 1
    Without doing the calculations, it could be possible to find the probability that, conditioned on $U_{(1)}=x$ and $E_1=k$, that a point $Y$ uniformly distributed on $[x,1]$ and $E_y$ exponentially distributed satisfies $\frac{Y}{E_y} > \frac{x}{k}$, then raise this the $(n-1)$th power, and finally integrate over the distributions of $U_{(1)}$ and $E_1$ – Henry Aug 31 '23 at 21:10
  • 1
    Trying it gives an ugly integral - simulation would be an alternative – Henry Aug 31 '23 at 21:22
  • 1
    My (unchecked) simulation attempt suggests a probability of almost $0.7$ when $n=2$, falling as $n$ increases, but perhaps for $n\ge 9$ then perhaps always between $0.55$ and $0.56$. – Henry Aug 31 '23 at 21:50
  • A nice approximate formula is found, see my answer. – NN2 Sep 05 '23 at 13:53

2 Answers2

3

I suspect you could do something like $$ \int_{x=0}^1 \int_{k=0}^\infty \left(\int_{y=x}^1 (1-e^{-yk/x}) \, dy\right)^{n-1} ne^{-k}\, dk \, dx $$ where $x$ is your $U_{(1)}$, $k$ is your $E_1$ and $y$ represents the higher values of $U_i$. I doubt this is easy to calculate beyond $n=2$ where it gives $\log_e(2) \approx 0.693$.

But I think simulation should work. The probability seems to reduce as $n$ increases, quickly getting below $0.56$ but never below $0.55$. Using R where the errors with a million simulations each should usually be smaller than $\pm 0.001$:

probinequality <- function(n){
   u <- runif(n)
   ei <- rexp(n)
   minu <- min(u)
   argmin <- which(u == min(u)) 
   return(all(ei[argmin] * u >= ei * minu))
   }

set.seed(2023) # for replication - use different seed for different values

sims <- replicate(10^6, probinequality(n=2)) mean(sims)

0.693554 # note: 0.693147 is about the true value

sims <- replicate(10^6, probinequality(n=3)) mean(sims)

0.618249

sims <- replicate(10^6, probinequality(n=4)) mean(sims)

0.589736

sims <- replicate(10^6, probinequality(n=10)) mean(sims)

0.557648

sims <- replicate(10^6, probinequality(n=1000)) mean(sims)

0.55486

Henry
  • 157,058
  • Is there any hope that the integral would be simpler when $n$ tends to infinity? – foubw Aug 31 '23 at 22:51
  • Also, how do you get the integral $\int_{y=x}^1 (1-e^{-yk/x}) , dy$ above? – foubw Aug 31 '23 at 23:12
  • 1
    I do not see an easy simplification as $n$ increases. $(1-e^{-yk/x})$ is the probability that $E_i \le U_{(i)} E_1 / U_{(1)}$ if $U_{(i)}=y, E_1=k, U_{(1)}=x$. You then need to multiply this by the uniform density for $y$ of $\frac1{1-x}$ but this term (when raised to the $(n-1)$th power because you want it true for all the other $(n-1)$ points) largely cancels with the $n(1-x)^{n-1}$ density for $x$; this used the point that the unsorted $U_i > x$ are uniformly distributed on $[x,1]$. – Henry Sep 01 '23 at 00:48
  • @Henry Could you provide a proof of your formula? – NN2 Sep 05 '23 at 14:59
  • 1
    @NN2 As I said in my previous comment, $\mathbb{P}(E_1 U_{(i)}\geq E_i U_{(1)} \mid U_{(i)}=y, E_1=k, U_{(1)}=x) = (1-e^{-yk/x})$. The $n-1$ values of $U_i$ which are not $U_{(1)}$ are conditionally uniformly distributed on $[x,1]$ so for each one the probability is $\int_{y=x}^1 \frac{(1-e^{-yk/x})}{1-x} , dy$ and for all of them $\frac1{(1-x)^{n-1}} \left(\int_{y=x}^1 (1-e^{-yk/x}) , dy\right)^{n-1}$. Then integrate over $k$ with density $e^{-k}$ and over $x$ with density $n(1-x)^{n-1}$ noting part of this cancels, to get my expression. I have only checked it for $n=2$. – Henry Sep 05 '23 at 15:20
  • 1
    My $1-e^{-yk/x}$ looks like your $1-\exp\left(-E_1\frac{U_{(i)}}{U_{(1)}} \right)$ – Henry Sep 05 '23 at 15:25
2

We can use conditional expectation to transform this probability as follows $$\begin{align} L&:=\mathbb{P}\left(E_i \le E_1\frac{U_{(i)}}{U_{(1)}},\forall i\ge2 \right)\\ &=\color{blue}{\frac{1}{2}}\mathbb{P}\left( \left\{ E_i \le E_1\frac{U_{(i)}}{U_{(1)}},\forall i\ge2 \right\} \cap \color{blue}{\underbrace{\left\{E' \le E_1\frac{U_{(1)}}{U_{(1)}} \right\}}_{\text{this is a trick to use the beautiful formula in $(5)$}}} \right )\\ &=\color{blue}{\frac{1}{2}}\mathbb{E}\left(\mathbb{E}\left(\prod_{\color{blue}{1}\le i \le n}\mathbf{1}_{\left\{E_i \le E_1\frac{U_{(i)}}{U_{(1)}}\right\}} |E_i,(U_{(i)})_{2\le i \le n}\right)\right)\\ &=\color{blue}{\frac{1}{2}}\mathbb{E}\left(\prod_{\color{blue}{1}\le i \le n}\mathbb{P}\left(E_i \le E_1\frac{U_{(i)}}{U_{(1)}} |E_i,(U_{(i)})_{2\le i \le n}\right)\right)\\ &=\color{blue}{\frac{1}{2}}\mathbb{E}\left(\prod_{\color{blue}{1}\le i \le n}\left(1-\exp\left(-E_1\frac{U_{(i)}}{U_{(1)}} \right) \right)\right)\\ &=\color{blue}{\frac{1}{2}}\int_{0\le x \le +\infty}\mathbb{E}\left(\prod_{\color{blue}{1}\le i \le n}\left(1-\exp\left(-x\frac{U_{(i)}}{U_{(1)}} \right) \right)\right)e^{-x}dx\\ &=\color{blue}{\frac{1}{2}}\int_{0\le y \le 1}\mathbb{E}\left(\prod_{\color{blue}{1}\le i \le n}\left(1-y^{\frac{U_{(i)}}{U_{(1)}} } \right)\right)dy \tag{1} \end{align}$$

I doubt we can compute (semi-) analytically $(1)$.

There is a result concerning the distribution of $\left(\frac{U_{(i)}}{ U_{(1)}}\right)_{i=2,...,n}$. From this, we can prove easily that, with $(Z_i)_{i=1,...,n-1}$ iid and following the uniform distribution, we have $$\frac{U_{(i)}}{ U_{(1)}} \stackrel{\mathcal{D}}{=} Z_1^{-1}\cdot Z_2 ^{-1/2}...Z_{i-1}^{-1/{(i-1)}} \hspace{1cm} \forall i=2,...,n \tag{2}$$ We can use $(2)$, then $(1)$ requires $(n+1)$ integrals for an exact calculation by Monte Carlo simulation for example.


Another approach is to approximate $\left(\frac{U_{(i)}}{ U_{(1)}}\right)_{i=\color{blue}{1},...,n}$ by their expected values. From $(2)$, we can prove easily that $$\mathbb{E}\left(\frac{U_{(i)}}{ U_{(1)}}\right) = i \hspace{1cm} \forall i=\color{blue}{1},...,n$$ then we assume that $$\mathbb{E}\left(\prod_{2\le i \le n}\left(1-y^{\frac{U_{(i)}}{U_{(1)}} } \right)\right) \approx \prod_{2\le i \le n}\left(1-y^{\mathbb{E}\left(\frac{U_{(i)}}{U_{(1)}}\right) } \right)=\prod_{2\le i \le +\infty}\left(1-y^{i } \right) \tag{3}$$

Applying $(3)$ to $(1)$, we have $$L \xrightarrow{n\to+\infty} \frac{1}{2}\int_{0\le y \le 1}\left(\prod_{2\le i \le +\infty}\left(1-y^{i } \right)\right)dy := M\tag{4}$$

The integral $M$ of $(4)$ has closed form expression according to this question $$\int_0^1\prod_{n=1}^\infty(1-x^n)dx=\frac{4\pi\sqrt3}{\sqrt{23}}\frac{\sinh\frac{\pi\sqrt{23}}3}{\cosh\frac{\pi\sqrt{23}}2}\tag{5}$$

By consequence, we can approximate $L$ by

$$\color{red}{L \approx \frac{1}{2}\frac{4\pi\sqrt3}{\sqrt{23}}\frac{\sinh\frac{\pi\sqrt{23}}3}{\cosh\frac{\pi\sqrt{23}}2}}$$

NN2
  • 15,892
  • +1 Nice. But your differential, $dx$, does not match the variable in the integrand, $y$, or the limits of integration $0\leq y\leq 1$, by the way. – Nap D. Lover Aug 31 '23 at 23:15
  • @NapD.Lover Thank you, it's a typo, I just corrected it. – NN2 Aug 31 '23 at 23:17
  • 1
    I suspect that for all $i$ you have a CDF for $\frac{U_{(i)}}{i\cdot U_{(1)}}$ of $F(x)=(1- \frac1{ix})^{i-1}$ for $\frac1i <x<\infty$ for all $n \ge i$ so it does not converge to $1$. If $i=n$ rather than being constant then this seems to converge in distribution to $F(x) = e^{-1/x}$ on $(0,\infty)$ as $n \to \infty$, again not constant. – Henry Sep 01 '23 at 01:16
  • @Henry A beautiful approximation is found. Please see my last version of this answer. – NN2 Sep 05 '23 at 13:51
  • @NN2 What does that make $M$ or $L$ numerically? I would have thought they should be less than $1$ – Henry Sep 05 '23 at 14:16
  • @Henry $L$ can be analytically approximable by $M$ (which has closed form expression). Both are evidently less than $1$ . – NN2 Sep 05 '23 at 14:27
  • OK - I missed the final $2$ (misreading it as $3$), so it now seems you are saying $L \approx 0.1842$. This looks close to $\frac13$ of my simulation results – Henry Sep 05 '23 at 14:54
  • @Henry There is only 1 error approximation in my method, that is $L \approx M$. However, in your proof, we don't know how you get your formula (which is not proved). – NN2 Sep 05 '23 at 14:58
  • @NN2 My simulation does not use a formula beyond all(ei[argmin] * u >= ei * minu) which is the $\mathbb{P}(E_1 U_{(i)}\geq E_i U_{(1)}, \forall i\geq2)$ condition in the original question. I did not try to evaluate my triple integral for $n>2$. – Henry Sep 05 '23 at 15:02
  • @Henry Ok, I'll check carefully your answer and mine. – NN2 Sep 05 '23 at 15:03