11

Given a sequence of iid random variables $X_i$ (without loss of generality from $U(0,1)$), an integer $k \ge 1$ and some $p \in (0,1)$, construct the sequence of random vectors $Z^{(j)}$, $j=0,1,...$ in the following way. Let

$$Z^{(0)}=(X_{(1)},...,X_{(k)}),$$

where $X_{(l)}$ is the $l$-order statistic of sample $\{X_1,...,X_k\}$. Introduce notations

\begin{align} Z^{(j)}&=(Z_{j,1},...,Z_{j,k}),\\ m_j&=\min(Z_{j-1,1},...,Z_{j-1,k},X_{k+j}),\\ M_j&=\max(Z_{j-1,1},...,Z_{j-1,k},X_{k+j}) \end{align}

Then

$$Z^{(j)}=(Y_{(1)},...,Y_{(k)})$$

where $Y_{(l)}$ is the $l$-order statistic of the following set which is

  1. The set $\{Z_{j-1,1},...,Z_{j-1,k},X_{k+j}\}\backslash m_j$ with probability $p$
  2. The set $\{Z_{j-1,1},...,Z_{j-1,k},X_{k+j}\}\backslash M_j$ with probability $1-p$

The decision between cases 1. and 2. is made independently from the $X_i$ (and hence from the $Z^{(i)}$).

The $Z^{(j)}$ are supported on the $k$-dimensional simplex $S_k = \{(x_1, \dots, x_k) \in \mathbb{R}^k \, | \, 0 \le x_1 \le x_2 \le \dots \le x_k \le 1 \}$.

It appears that the $Z^{(j)}$ converge in distribution. Is this known? Is anything known about the limiting distribution?

For the case $k=1$, the answer is the following. Denote the cdf of $Z^{(j)}$ by $F_j$.

The cdf of $\min(X_{n+1},Z^{(n)})$ (for $U(0,1)$ case) is

$$x+F_n(x)−xF_n(x)$$ and the cdf of $\max(X_{n+1},Z^{(n)})$ is

$$xF_n(x)$$.

Hence

\begin{align} F_{n+1}(x)&=p(x+F_n(x)−xF_n(x))+(1−p)xF_n(x)\\ &=px+(p(1-x)+(1-p)x)F_n(x) \end{align}

Since $p(1-x)+(1-p)x\in(0,1)$ we have that

$$\lim F_{n}(x)=\frac{px}{1-p(1-x)-(1-p)x}$$

I am looking for general results (case $k>1$) either for the limiting distribution of the whole vector $Z^{(j)}$ or of some of its components (marginal distributions).

Hans Engler
  • 15,439
  • Cleaned up old, no-longer relevant comments. Said comments are still on display in the original posting at Stats.SE. – Willie Wong Apr 18 '11 at 11:44

2 Answers2

2

EDIT I actually had a different answer for this question referring to markov chains, but then I saw a different and better way of solving the problem. So I changed my answer - can post the old one back if people want it

Now $Z^{(m)}$ will be a function of $X_1,\dots,X_{k+m-1}$. In particular it will be a function of the order statistics of this set.

For a uniformly distributed variable $X_i$, if we take a collection of $k+m-1$ of them, the rth order statistic has a beta distribution:

$$X_{(r)}\sim Beta(r,k+m-r)$$

Now what will the process have done by the mth step? It will have removed the top $n_t$ order statistics and the bottom $n_b$, with the constraint that $n_t+n_b=m-1$. Stated this way the probability of any such set occurring is given by the binomial distribution:

$$Pr(n_t)={m-1 \choose n_t}p^{n_t}(1-p)^{m-1-n_t}$$

Now given $n_t$ is the number chosen, we have $Z^{(m)}=(X_{(m-n_t)},\dots,X_{(k+m-1-n_t)})$ Now the joint distribution of $Z^{(m)}$ which looks like:

$$P(Z^{(m)}|n_t)=\frac{(k+m-1)!X_{(m-n_t)}^{m-1-n_t}(1-X_{(k+m-1-n_t)})^{n_t}}{(m-1-n_t)!(n_t)!}dX_{(m-n_t)}\dots dX_{(k+m-1-n_t)}$$

Which is a Dirichlet distribution with parameters $(m-n_t,1,1,\dots,1,n_t+1)$ with $k-2$ 1's in the middle (which is a general result for the joint distribution of order stats - not sure if its known).

Now we need to sum out $n_t$ with respect to its probability:

$$P(Z^{(m)})=\sum_{n_t=0}^{m-1}P(n_t)P(Z^{(m)}|n_t)$$ $$=\frac{(k+m-1)!}{(m-1)!}\sum_{n_t=0}^{m-1}{m-1 \choose n_t}^{2}\left[(1-X_{(k+m-1-n_t)})p\right]^{n_t}\left[X_{(m-n_t)}(1-p)\right]^{m-1-n_t}$$

Which looks tantalizingly close to a binomial distribution, if it wasn't for that annoying squared term in the summation. I don't know an exact solution for this summation. This is a far as I could get. You would likely require numerical work for this.

  • +1, good idea, Markov is probably the way to go, but still there is a lot of details to fill in to get to the final answer. – mpiktas Apr 07 '11 at 14:01
  • @mpiktas - you should prob remove the +1 unless you think my new answer is still good. This is allowed right? or should I have posted a second answer? – probabilityislogic Apr 07 '11 at 14:57
  • @probabilityislogic maybe I'm not following your argument, but consider the $k=2$ case where the first four observations are 0.5, 0.4, 0.6, and 0.2. Assume the coin flips are such that we discard the min at step 3 and discard the max at step 4. This leaves the set ${0.2, 0.5}$ which does not match your description, i.e., it has a "hole" in the middle of the order statistics of the four observations. – cardinal Apr 07 '11 at 15:18
  • So this corresponds to $n_t=1$ in the conditional distribution. we have $m=3$ after two draws, and you have $X_{(2+3-1-1)}=X_{(3)}=0.5$ and $X_{(3-1)}=X{(2)}=0.2$ as the particular realisations. Does this help? – probabilityislogic Apr 07 '11 at 15:39
  • the $\dots$ in $dX_{(m-n_t)}\dots dX_{(k+m-1-n_t)}$ is representing the differentials of the other order statistics. so we have $dX_{(m-n_t)}\dots dX_{(k+m-1-n_t)}=dX_{(m-n_t)}dX_{(m-n_t+1)}dX_{(m-n_t+2)}\dots dX_{(k+m-2-n_t)}dX_{(k+m-1-n_t)}$ The joint pdf is uniform in these "in between" the extremes – probabilityislogic Apr 07 '11 at 15:43
  • This is why the dirichlet distribution is used, with all the $1$ parameters – probabilityislogic Apr 07 '11 at 15:44
  • @probabilityislogic: Thanks for both posts (I read the first one). I think I can see now how to rewrite this process as a Markov chain on the state space $S_k = {(x_1, \dots, x_k) | , 0 \le x_1 \le x_2 \le \dots \le x_k \le 1}$. Your closed formula and the approach looks very interesting. But it'll be hard to get an idea of the stationary limit or of the convergence rate, I think –  Apr 07 '11 at 15:45
  • Sorry, that should be $X_{(2)}=0.4$ not $X(2)=0.2$. If you take out min and max out of ${0.2,0.4,0.5,0.6}$ you get left with ${0.4,0.5}$ not ${0.2,0.5}$ – probabilityislogic Apr 07 '11 at 15:48
  • @Hans - I was thinking that you could take the maximum term in the sum as an approximation, which should work really well due to the square combinatorical term for large $m$ (entropy style). maybe that will help. – probabilityislogic Apr 07 '11 at 15:50
  • 3
    @probabilityislogic - the point that cardinal observed is that the $Z^{(m)}$ is not necessarily of the form $(X_{(m-n_t)}, \dots, X_{(m-n_t+k-1)})$. It's not formed by deleting the $n_t$ largest and $(m-1-n_t)$ smallest entries from $(X_{(1)}, \dots, X_{(m+k-1)})$, where $n_t \sim B(m-1,p)$. –  Apr 07 '11 at 15:58
  • @Hans - ah yes, I see now. This may actually make things simpler (or harder) because the binomial co-efficient disappears from the equation. but it now depends on the order. will have to do some more thinking I reckon. – probabilityislogic Apr 07 '11 at 16:03
  • @probabilityislogic, when you have a new answer it is perfectly ok to post it as new one, although SE engine actively discourages it. You have proposed 2 different solutions, so it is ok to present them both. Overwriting the old answer (IMHO) is appropriate when the new edit is continuation of the old one. – mpiktas Apr 08 '11 at 10:33
  • @probabilityislogic, I removed the upvote, since due to bounty, the stakes are higher now :) – mpiktas Apr 11 '11 at 10:22
  • Downvoted. (Sorry, @probabilityislogic, I do so only grudgingly.) The approach, as currently given, does not analyze the process correctly. – cardinal Apr 18 '11 at 12:04
  • 1
    @cardinal - yeah, I'm thinking I should remove this answer – probabilityislogic Apr 18 '11 at 12:58
  • @probabilityislogic if you decide to go that route, let me know and I'll remove the downvote before you delete. Another option is to edit again and make clear that an inconsistency is present. Maybe someone can modify the approach and make it work. I tried a couple of different things, but couldn't find any traction. – cardinal Apr 18 '11 at 13:05
1

I've asked this question in mathoverflow. It has one answer, which I do not find time to check, so I am giving the link as an answer here.

mpiktas
  • 1,489