Expectation of the largest order statistic from uniform random variables

Question

If $X_1, ..., X_n$ are iid random variables from the Uniform[$0,\theta$] distribution, where $\theta >0$, compute the expectation of the largest order statistic denoted $X_{(n)}$.

I am looking to test whether or not this statistic is an biased or unbiased estimator for $\theta$, however I am struggling to test the bias as I am unable to compute its expectation.

My initial thoughts with this question were that $E_{\theta}(X_{(n)})=\frac{\theta}2$ since this should be the expectation of any random variable from the uniform distribution on this interval. However, I can see that clearly this won't be the case in this situation since the expectation must (intuitively) depend upon $n$ in some way.

I am wondering what the problem is with my initial thoughts.

Edit: I have been informed in the comments of what the correct approach is, however, I am still unclear of the problem with my reasoning that $E_{\theta}(X_{(n)})=\frac{\theta}2$ since $E_{\theta}(X_i)=\frac{\theta}2$ for all possible values of $i$. By definition, there exists some natural number $j$ such that $X_j=X_{(n)}$ so why is it that $X_j$ doesn't follow the uniform distribution when every $X_i$ does.

Compute the expected value using the PDF. To find that, consider the CDF of $X_{(n)}$, i.e. $\operatorname{Pr}[X_{(n)} \le c]$. Hint: the max being $\le c$ means they all are. — Joe, May 22 '22 at 01:32
The problem with your initial thoughts: $X_{(n)}$ will not have a uniform distribution. Intuitively, it would be stochastically larger than a random variable with a uniform distribution on $[0,\theta]$, i.e., its distribution will put more mass near $\theta$ than near $0$. — passerby51, May 22 '22 at 01:37
https://math.stackexchange.com/questions/60497/unbiased-estimator-for-a-uniform-variable-support — angryavian, May 22 '22 at 01:37
I understand the intuition behind what you've said, but doesn't that contradict with the assumption that each $X_i$ follows the uniform distribution? So we would say that the expectation of each $X_i$ is equal to $\frac{\theta}2$, but yet we can't say that $X_{(n)}$ has this expectation (even though this is equal to one of the $X_i$s)? @passerby51 — FD_bfa, May 22 '22 at 01:49
@Joe Thanks, that makes sense as an approach, however I'm still unsure about why $X_{(n)}$ doesn't just follow the uniform distribution. If we are able to say that each $X_i$ has the expectation of the uniform distribution and that $X_{(n)}$ is equal to one of the $X_i$s, then surely it has the same expectation? — FD_bfa, May 22 '22 at 01:53
@callculus42 No, that doesn't answer my question. I understand the derivation they have used there. The issue that I have is that I don't understand why we cannot say that the expectation is equal to $\frac{\theta}2$. There isn't any reference to that problem on any other questions that have been posted before (as far as I can tell). — FD_bfa, May 22 '22 at 01:58
@FD_bfa Isn't it obvious that $\max(X_n)$ is different from $X_i$ and therefore their expectations? — callculus42, May 22 '22 at 02:06
Let's say we have $X_1 ... X_{10}$ which are iid Uniform$[0, \theta ]$. We are comfortable saying that $E(X_i)=\frac{\theta}2$. Let's say that $X_7$ (for example) is our maximum value and so $X_{(n)}=X_7$. We have just said that each $X_i$ has expectation $\frac{\theta}2$ and yet now $X_7$ no longer has this as it's expectation. $X_7$ is definitely one of the $X_i$s. I am confused by why this is the case, given that there now seem to be two (seemingly) contradictory statements @callculus42 — FD_bfa, May 22 '22 at 02:12
@FD_bfa, There is no contradiction. You are just seeing one of the wonders of probability. $X_i$ is uniformly distributed for any fixed $i$. But $X_{(n)} = X_J$ for a random index $J$ that depends on $X_1,\dots,X_N$. There is no reason why this should have the same distribution as any of $X_i$ for a fixed $i$. Once you understand this, you basically graduate in your understanding of probability to the next level! — passerby51, May 22 '22 at 02:22
@callculus42 if $X_7$ has expectation $\frac{\theta}2$ and $X_7=X_{(n)}$ then doesn't this directly imply that $E(X_{(n)})$ is the same? — FD_bfa, May 22 '22 at 02:22
@FD_bfa What you're describing is a conditional expectation: $E(X_7|X_7=X_{(n)})$ — callculus42, May 22 '22 at 02:30
Suppose you have three observations and then order them so $X_{(1)} \le X_{(2)} \le X_{(3)}$ (in fact they will almost surely be distinct). By symmetry, the expectation of the middle one is $E[X_{(2)}]=\frac{\theta}{2}$ and the largest one is almost surely larger, so $E[X_{(3)}] >\frac{\theta}{2}$. Similar arguments work for any $n\ge 2$, though slightly extended when $n$ is even, and you might want to argue by symmetry that $E[X_{(1)}]+E[X_{(n)}]=1$ and $X_{(n)} > X_{(1)}$ almost surely so $E[X_{(n)}]> E[X_{(1)}]$ and thus $E[X_{(n)}]>\frac{\theta}{2}$. — Henry, May 22 '22 at 03:02
Maybe the answer is clear to you already, but since you asked me, I'd say consider if I have 10 cards, with each value 1-10 appearing once. If I shuffle the deck, and put the cards down in a row, we can say that each $X_i$ (the value of card $i$) has a discrete uniform distribution over the values 1-10 (before the shuffle for frequentists, or even after for Bayesians if they are face down). However, $X_{(10)}$ does not have the same distribution. — Joe, May 22 '22 at 06:40

passerby51 · Accepted Answer · 2022-05-22T03:21:34.520

Suppose we have two random variables $X_1$ and $X_2$, uniformly distributed in $[0,1]$ and independent of each other.

Let $X_{(2)} = \max(X_1,X_2)$. Someone claims that $X_{(2)}$ is either equal to $X_1$ or $X_2$ so it should have the same distribution as either of $X_1$ or $X_2$, that is, uniform in [0,1].

This is not correct. Let $J$ be the index of the maximum, that is, $J = 1$ if $X_1 > X_2$ and $J=2$ if $X_2 > X_1$. If $X_1 = X_2$, we can define $J$ arbitrarily, so let $J = 1$ in that case, that is, \begin{align*} J = \begin{cases} 1 & X_1 \ge X_2 \\ 2 & X_1 < X_2 \end{cases} \end{align*} We have $X_{(2)} = X_J$. The difficulty is that $J$ is dependent on $(X_1,X_2)$. Let us compute $\mathbb E X_{(2)} $, by first conditioning on $J$. We have \begin{align*} \mathbb E ( X_{(2)} \,|\, J = 1) &= \mathbb E ( X_1 \,|\, J= 1) \\ &= \mathbb E ( X_1 \,|\, X_1 \ge X_2). \end{align*} Unconditionally, $X_1$ is uniformly distributed on $[0,1]$. But conditional on $X_1 \ge X_2$ its distribution changes. The easiest way to see this is to note that the joint distribution of $(X_1,X_2)$ conditional on $X_1 \ge X_2$ is uniformly distributed on the triangle in $\mathbb R^2$ with vertices $(0,0), (1,0)$ and $(1,1)$. This is the restriction of the uniform distribution on the $[0,1]^2$ to the set $T := \{(x_1,x_2): x_1 \ge x_2\}$.

This gives \begin{align*} \mathbb E [X_1 \mid X_1 \ge X_2] &= \frac1{\text{area}(T)}\int_{T} x_1 \, dx_1 dx_2 \\ &= \frac1{1/2} \int_0^1 \int_{x_2}^1 x_1 dx_1 dx_2 = \int_0^1 (1-x_2^2) dx_2 = 1 - \frac13 = \frac23. \end{align*} Note that this is bigger than $1/2$. By symmetry we also have $\mathbb E[X_2\mid X_2 > X_1] = 2/3$. Then, $$ \mathbb E[X_{(2)}] = \frac12 \mathbb E[X_1 | X_1 \ge X_2] + \frac12 \mathbb E [X_2 | X_2 > X_1] = \frac23. $$

TL;DR: The knowledge of the index $J$, that is, which of the variables is the maximum changes the joint distribution of $(X_1,X_2)$. In this case, once you know $J$, $X_1$ and $X_2$ are no longer independent! Once you say that I know the random variable $X_1$ is the maximum, you have changed the distribution of $(X_1,X_2)$. That is what conditioning does in general.

Thanks. That makes a lot of sense. I hadn't thought about the fact that we lose the right to say that the random variables are independent once we know which is the maximum. — FD_bfa, May 22 '22 at 03:26

Expectation of the largest order statistic from uniform random variables

1 Answers1