Distribution of a sum of first and last order statistics

Question

I can't find anywhere on the internet a solution to the following exercise:

Let $X = (X_1, \dots, X_n)$ be a sequence of i.i.d random variables from exponential distribution. Find the distribution of statistic $T = \frac{1}{2}(X_{(1)} + X_{(n)})$ where $X_{(1)}, X_{(n)}$ are first and last order statistics.

Preparation: $$f_{X_i}(x_i) = \lambda e^{-\lambda x_i}\mathbb{1}_{(0,\infty)}(x_i)$$ $$F_{X_i}(x_i) = 1-e^{-\lambda x_i}$$ I know the general formula for the density function of vector $X_{(r)},X_{(s)}$ which is $$f_{X_{(r)},X_{(s)}}(u,v) = \frac{n!}{(r-1)!(s-r-1)!(n-s)!}F(u)^{r-1}f(u)(F(v)-F(u))^{s-r-1}f(v)(1-F(v))^{n-s}$$

So by using this formula for $X_{(1)}, X_{(n)}$ I have: $$f_{X_{(1)}, X_{(n)}}(x,y) = (n-1)n\lambda^2e^{-\lambda(x+y)}(e^{-\lambda x}-e^{-\lambda y})^{n-2}\mathbb{1}_{(0,\infty)}(x)\mathbb{1}_{(0,\infty)}(y)\mathbb{1}(x \leq y)$$

First approach:

Here is the area of integration

$$F_T(t) = P(T \leq t) = P(\frac{1}{2}X_{(1)} + \frac{1}{2}X_{(n)} \leq t) = P(X_{(n)} \leq -X_{(1)}+2t) = \iint_A f_{X_{(1)}, X_{(n)}}(x,y) \,dx\,dy = \\ =\int_{0}^{t}\int_{x}^{-x+2t} (n-1)n\lambda^2e^{-\lambda(x+y)}(e^{-\lambda x}-e^{-\lambda y})^{n-2} \,dy\,dx = \\ =(n-1)n\lambda^2\int_{0}^{t} e^{-\lambda x}\int_{x}^{-x+2t}e^{-\lambda y}(e^{-\lambda x}-e^{-\lambda y})^{n-2} \,dy\,dx$$ I don't know how to calculate this integral.

Second approach:

Here is the area of integration $$F_T(t) = P(T \leq t) = P(\frac{1}{2}X_{(1)} + \frac{1}{2}X_{(n)} \leq t) = \iint_B f_{X_{(1)}, X_{(n)}}(x,y) \,dx\,dy = \dots$$ Now using substitution: $u = x, v= \frac{1}{2}(x+y)$ so $x = u, y = 2v-u, |J| = \begin{bmatrix} 1 & 0\\ -1 & 2 \end{bmatrix} = 2$ $$\dots = \int_{-\infty}^{\infty}\int_{-\infty}^{t} 2f_{X_{(1)}, X_{(n)}}(u,v-u) \,dv\,du = \int_{-\infty}^{t}\int_{-\infty}^{\infty} 2f_{X_{(1)}, X_{(n)}}(u,v-u) \,du\,dv$$ So that means: $$f_T(v) = \int_{-\infty}^{\infty} 2f_{X_{(1)}, X_{(n)}}(u,v-u) \,du = \\ =\int_{-\infty}^{\infty} 2(n-1)n\lambda^2e^{-\lambda(u+2v-u)}(e^{-\lambda u}-e^{-\lambda (2v-u)})^{n-2}\mathbb{1}_{(0,\infty)}(u)\mathbb{1}_{(0,\infty)}(2v-u)\mathbb{1}(u \leq 2v-u)\,du =\\ = 2(n-1)n\lambda^2 e^{-2\lambda v} \int_{0}^{\infty}(e^{-\lambda u}-e^{-\lambda (2v-u)})^{n-2}\,du $$ Again similar integral.

Third approach: Chat GPT gave me the following solution :)

$X_{(1)}=\min(X_1,\dots,X_n), X_{(n)}=\max(X_1,\ldots,X_n)$ - I agree with this so far. $$F_T(t) = P(T \leq t) = P(\frac{1}{2}X_{(1)} + \frac{1}{2}X_{(n)} \leq t) = P(X_{(1)} \leq -X_{(n)}+2t) = \\ =1 - P(X_{(1)} > 2t - X_{(n)}) = \\ =1 - P(X_1>2t-X_n, X_2>2t-X_n, \dots, X_{n-1}>2t-X_n, X_n>2t-X_n)=\\ = 1 - \prod_{i=1}^{n}P(X_i>2t-X_n) = 1-[1-F_{X_i}(2t-X_n)]^n= 1-[1-\int_{2t}^{\infty} \lambda e^{\lambda(2t-x)}\,dx]^n = \\ = 1- [e^{-\lambda t}]^n = 1-e^{-n \lambda t}$$ $$f_T(t) = \frac{d}{dt}(1-F_T(t)) = \frac{d}{dt}(e^{-n\lambda t}) = n\lambda e^{-n \lambda t}$$ I do not understand why from $X_{(1)}>2t-X_{(n)}$ we go suddenly to $X_1>2t-X_n, \dots, X_n>2t-X_n$

Summary:

My questions are:

If my first and second approach is correct, how to solve integrals at the end?
Is chat gpt correct?
If none of this approaches are correct then I would be grateful for solution.

See https://math.stackexchange.com/questions/80475/order-statistics-of-i-i-d-exponentially-distributed-sample for a hint. — kimchi lover, Apr 18 '23 at 22:01
Thank you for your comment! Unfortunately, I have seen discusion in linked post already, but I do not understand how this could help me. — Hedgehog, Apr 19 '23 at 19:22

score 1 · Answer 1 · answered Apr 19 '23 at 13:58

The distribution of $T$ does not seem easy though the mean is relatively easy and the variance might use memorylessness which I think suggests that $\frac{1}{2}(X_{(n)} - X_{(1)})$ has the same distribution as the maximum of $n−1$ iid exponential distributions with rate $2λ$ and is independent of $X_{(1)}$ which has an exponential distribution with rate $nλ$, while their sum is $T$. So I think you can say

$\mathbb{E}(T)=\frac{1}{2\lambda}\left(\frac1n+\sum\limits_{k=1}^n \frac1k\right)=\frac{1}{2\lambda}(\frac1n+H_n)$ which as $n$ increases is close to the slowly increasing $\frac{\log_e(n) +\gamma}{2\lambda}$
$\text{Var}(T)=\frac{1}{4\lambda^2}\left(\frac3{n^2}+\sum\limits_{k=1}^n \frac1{k^2}\right)$ which as $n$ increases is close to the constant $\frac{\pi^2}{24\lambda^2}$

The CDF integral I get is

$$\mathbb P(T \le t)=\int_{x=0}^t n\lambda e^{-n\lambda (t-x)}\left(1-e^{-2\lambda x}\right)^{n-1} \, dx$$

which for specific $n$ can be expended and integrated. Doing this and simplifying seems to give the CDF as $1$ minus an $(n-1)$-degree polynomial function of $e^{-2\lambda t}$ plus an extra term related to $e^{-n\lambda t}$ whose form appears to depend on the parity of $n$. For example:

when $n=3$, $\mathbb P(T \le t)=1-\left(6 {{e}^{-2 \lambda t}}+3 {{e}^{-4 \lambda t}}\right)+8 {{e}^{-3 \lambda t}}$
when $n=4$, $\mathbb P(T \le t)=1-\left(6 {{e}^{-2 \lambda t}}-3 {{e}^{-4 \lambda t}}-2 {{e}^{-6 \lambda t}}\right)+12 \lambda t\,{{e}^{-4 \lambda t}}$

To illustrate this with a simulation in R, consider $n=4$ and $\lambda=0.1$:

n <- 4
lambda <- 0.1
set.seed(2023)
mid <- function(samples, rate){
   X <- rexp(samples, rate)
   return( (min(X) + max(X)) / 2  )
   }
sims <- replicate(10^5, mid(samples=n, rate=lambda))

Allowing for simulation noise, this seems to give the mean and variance

mean(sims)
# 11.65139
(1/n + sum(1/(1:n))) / (2*lambda) 
# 11.66667
(log(n) + 0.5772156649 + 3/(2*n)) / (2*lambda)
# 11.69255
var(sims)
40.05136
(3/n^2 + sum(1/(1:n)^2)) / (4*lambda^2)
40.27778
pi^2 / (24*lambda^2)
41.12335

and the theoretical CDF in red matches the simulated empirical CDF in black

plot.ecdf(sims)
curve(1 - (6*exp(-2*x*lambda)-3*exp(-4*x*lambda)-2*exp(-6*x*lambda)) +
          12*x*lambda*exp(-4*x*lambda), 
      from=0, to=max(sims), col="red", add=TRUE)

"Thank you for your answer! I found this exercise in the textbook recommended for practicing before the exam. I cannot confirm whether your solution is correct, but from what I can see in your answer, there is no general formula for any n because it's too hard to calculate, right?" — Hedgehog, Apr 19 '23 at 19:18
You could do a binomial expansion inside the integral, giving a sum, and then integrate each term of the sum, leaving the result as a not very attractive sum of terms — Henry, Apr 19 '23 at 21:43

A rural reader · Answer 2 · 2023-04-20T00:29:01.510

In the standard case $\lambda=1$, with $X$ and $Y$ representing the first and final order statistics respectively, their joint density is \begin{equation} f(x, y) = n(n - 1)(e^{-x} - e^{-y})^{n-2}e^{-x}e^{-y} \end{equation} for $0 < x < y$, zero otherwise. The sum $Z = X + Y$ has density \begin{equation} f_Z(z) = \int_{-\infty}^\infty\, f(x, z-x)\, dx \end{equation} which in this case works out to be \begin{equation} f_Z(z) = \int_{0}^{z/2}\, f(x, z-x)\, dx \end{equation} for $z>0$ and zero otherwise.

So for example \begin{align} n = 3: &\quad 6e^{-z} - 12e^{-3z/2} + 6e^{-2z} \\ n = 4: &\quad 6e^{-z} -12ze^{-2z} - 6e^{-3z} \\ \end{align}

Distribution of a sum of first and last order statistics

2 Answers2

40.05136

40.27778

41.12335