Joint distribution of $(X,\min(X,Y))$ for $X$ and $Y$ i.i.d. uniform on $(0,1)$

Question

Take two independent random variables $X_1, X_2$, identically uniformly distributed over $[0,1]$. Now take $U= \min (X_1, X_2).$ There's a standard procedure to evaluate the distribution of $U$: $p_U(u) = 2(1-u).$ What I would like to find is the joint distribution of $X_1$ and $U$. It's probably a standard question, and reference is fine for me: I just couldn't find it by myself. What I tryed, is something informal: $P(X_1 = x, U = u) = P(X_1 = x | U = u)2(1-u)$. But this is something strange, since, from simmetry, I would say that $P(X_1 = u | U = u) = 0.5,$ and so my guess for $P(X_1 = x | U = u)$ is $\frac{1}{2} 1_{\{u\}}(x) + \frac{1}{2(1-u)} 1_{(u,1]}(x),$ which looks somehow wrong.

Indeed the conditional distribution of $X_1$ conditioned on $U=u$ is neither discrete nor continuous, having a discrete part which is half of the Dirac mass at $u$ and a continuous part which is half of the uniform distribution on $(u,1)$. That is, your final formula... (Which there are more rigorous ways to write down and to reach than the one in your post, I have to add.) — Did, Nov 01 '15 at 09:29
yes, there are more formal ways. The real exercise is to calculate the regular conditional probability of $X_1$ given $U,$ but I can't find the answer formally. So I just need something informal to make a guess and then try to prove that it's right. But I am very interested in formal ways to find the answer, which I don't know. — Kore-N, Nov 01 '15 at 09:35
For a rigorous approach, try to adapt this. Note that for every measurable bounded test function $g$, $$E(g(X_1,U))=\int_0^1g(x,x)(1-x)dx+\int_0^1\int_0^xg(x,u)dudx,$$ that is, $$E(g(X_1,U))=\int_0^1\int_0^1g(x,u)((1-u)\delta_u(dx)du+\mathbf 1_{u<x}dudx),$$ thus, the joint distribution of $(X_1,U)$ is $$(1-u)\delta_u(dx)du+\mathbf 1_{u<x}dudx.$$ To find the conditional distribution $\mu_u$ of $X_1$ conditioned on $U=u$, freeze $u$ in this formula. Thus, ... — Did, Nov 01 '15 at 09:48
... $\mu_u$ is proportional to $(1-u)\delta_u(dx)+\mathbf 1_{u<x}dx$, whose total mass is $2(1-u)$, hence finally, $$\mu_u(dx)=\frac12\delta_u(dx)+\frac12\frac{\mathbf 1_{u<x}}{1-u}dx.$$ — Did, Nov 01 '15 at 09:48
This formula simply means that, for every measurable bounded $g$, $$E(g(X_1)\mid U=u)=\frac12g(u)+\frac12\frac1{1-u}\int_u^1g(x)dx,$$ which entirely characterized the desired conditional distribution. — Did, Nov 01 '15 at 09:54
Thank you! Actually I managed the part on the conditional distribution, I had real difficulties to prove that the thing I found was the joint probability density, (which I needed to prove that the other thing I found was a version of the conditional expectation. The testing on all Gs idea is quite a good idea :) — Kore-N, Nov 01 '15 at 10:10
To be complete, yes this is a good idea (one could even call it the canonical approach) but it is not mine... :-) — Did, Nov 01 '15 at 10:19

Zhanxiong · Accepted Answer · 2017-06-09T02:19:44.490

A measure-theoretic solution:

Let $F$ denote the distribution function of $X$. Intuition gives that the conditional probability of the event $[U > u]$ given the $\sigma$-field $\sigma(X)$ is given by: $$P(U > u | X)(\omega) = I_{[X > u]}(\omega)(1 - F(u)) \tag{$*$}.$$ (A rigorous proof for $(*)$ will be given at the end of this answer.) Intuitively, for fixed $u$, given the observation of $X(\omega)$, if $X(\omega) \leq u$, there is no hope that $U(\omega)$, which is at most $X(\omega)$, can exceed $u$, this explains the existence of the first term of $(*)$; if $X(\omega) > u$, then $U(\omega) > u$ if and only if $Y(\omega) > u$, by independence of $X$ and $Y$, this gives $1 - F(u)$.

Now for any $x \in \mathbb{R}^1$, since $[X \leq x] \in \sigma(X)$, it follows by $(*)$ that \begin{align} & P(U > u, X \leq x) \\ = & \int_{X \leq x} P(U > u|X)(\omega) dP \qquad \text{by the definition of conditional probability} \\ = & \int_{X \leq x} (1 - F(u))I_{[X > u]}(\omega) dP \qquad \text{by $(*)$} \\ = & \int_{(-\infty, x]}(1 - F(u))I_{(u, +\infty)}(v) dF(v) \qquad \text{change of variable formula}\\ = & \begin{cases} 0 & \text{if $x \leq u$}; \\ (1 - F(u))(F(x) - F(u)) & \text{if $x > u$}. \end{cases} \end{align} Together with that $P(X \leq x) = F(x)$, it follows that \begin{align} & P(U \leq u, X \leq x) \\ = & P(X \leq x) - P(U > u, X \leq x) \\ = & \begin{cases} F(x) & \text{if $x \leq u$}; \\ F(x) - (1 - F(u))(F(x) - F(u)) & \text{if $x > u$}. \end{cases} \tag{$**$} \end{align}

Proof of $(*)$: Clearly, the right hand side of $(*)$ is $\sigma(X)$-measurable, in view of $\{X > u\} \in \sigma(X)$. In addition, for every $H \in \mathscr{R}^1$, by the change of variable formula, \begin{align*} & \int_{X \in H} I_{[X > u]}(\omega)(1 - F(u))dP = (1 - F(u))\int_H I_{(u, +\infty)}(x)dF(x) = P(Y > u)P([X \in H] \cap [X > u]) \\ = & P([Y > u] \cap [X > u] \cap [X \in H]) = P([U > u] \cap [X \in H]), \end{align*} where we used $X$ and $Y$ are independent in the first equal sign of the second line above. Since $[X \in H]$ is the general element of $\sigma(X)$, this proves $(*)$.

Some poster was curious about why I didn't report the joint density (i.e., pdf) instead of the joint cdf. Well, the fact is: this is an example that a random vector doesn't have joint density!

Let $A = [(x, u): x = u]$ and let $\mu$ be the probability measure on $\mathbb{R}^2$ induced by $(X, U)$. It can be seen that $$\mu(A) = P(X = U) = P(X \leq Y) = \frac{1}{2}.$$ Therefore there is a positive probability mass concentrated on a set which has Lebesgue measure $0$, so it is impossible for $(X, U)$ to have a joint density $f$ with respect to the planar Lebesgue measure $\lambda_2$ (otherwise, $\mu(A) = \iint_A f(x, u) dx du = 0$, contradiction!). Therefore the most clear way to describe the joint distribution of $(X, U)$ is still through (**).

@A.S. This is the joint CDF of $(U, X)$, which characterizes the joint distribution of $(U, X) already. Of course, you can easily get the joint pdf if you want, but to me, joint CDF is more essential than PDF for a random vector. — Zhanxiong, Nov 03 '15 at 15:38
I know a method to derive distribution from CDF (see my answer to this very question) - but it was criticized (in deleted comments - in my opinion completely unfairly and without offering an alternative) - so I wondered if there are other methods or if you'd use the same approach. As a sidenote, while a CDFs fully characterize the distribution, they are WAY harder to intuite and visualize than distributions itself. — A.S., Nov 03 '15 at 15:50
"Distribution" and "cdf" are actually the identical term, don't confuse them as different things. — Zhanxiong, Nov 03 '15 at 15:58
No. Distribution is distribution and CDF is Cumulative Distribution Function - an integral of the distribution in other words. — A.S., Nov 03 '15 at 15:59
I think you think distribution as denisty, first things first, every random variable has its distribution but not every random variable has a density, which you will know from advanced probability theory. This is why i wrote in my first comment that cdf is more essential than pdf. — Zhanxiong, Nov 03 '15 at 16:02
So, how would you derive distribution from the CDF you obtained? — A.S., Nov 03 '15 at 16:18
@A.S. I apologize that I should not say "you can easily get the joint pdf if you want" (I was on a bus at that time and didn't make deep thinking), actually this problem gives an exact example that the density of $(X, U)$ doesn't exist. Please see my extended answer. — Zhanxiong, Nov 03 '15 at 19:30
Sorry for the delay. I read the answer now, and I like it :) Thanks for the help! — Kore-N, Dec 27 '15 at 13:11

A.S. · Answer 2 · 2015-11-03T01:08:00.773

-3

Conditioning on sets of measure zero makes symmetry arguments misleading. Bulletproof approach deals with CDF:

$$P(X_1> x,U> u)=P(X_1>\max(x,u))P(X_2>u)=(1-\max(x,u))(1-u)$$ Differentiating we get: $$f_{X_1,U}(x,u)=\frac {\partial^2P}{\partial x\partial u}=\frac{\partial}{\partial u}(-I(x>u)(1-u))=I(x>u)+(1-u)\delta(x=u)$$

edited Nov 03 '15 at 01:08

answered Nov 01 '15 at 09:24

A.S.

4,004

@Did The method I present is correct, applies to all distributions if you interpret derivative distributionally, and in case of mixtures of discrete and continuous distributions requires no adaptations from "the usual" beyond $u'=\delta$. The largest extent of your comment could have been additional emphasis on the fact that differentiation in this case is distributional - not classical. On the other hand... – A.S. Nov 03 '15 at 01:04
...the method you presented in this case requires conditioning for the first step and a hunch to represent $g(x,x)=\int g(x,u)\delta_x(u)du$ for the second. How would you even use this method to find distribution with a CDF of $F(x,y)=\frac 1 2 xy+\frac 1 4 (x+y)$ on $[0,1]^2$? – A.S. Nov 03 '15 at 01:05
Funny: the OP manages to make a whole thread tearing their approach apart disappear, then reposts two phony (and insulting) comments repeating the same claims. Interestingly, since every comment from me was deleted, the pair of comments above was not signalled to me. OP: The precise and fully rigorous approach I mentioned in the now disappeared comments to your question, which seem to anger you so much, is still visible in comments to the main question hence beware, everybody can see you are not addressing it. ... – Did Nov 04 '15 at 11:11
... Mathematically speaking, it is simply untrue that the approach I advocated requires some conditioning for the first step (which conditioning? which first step?) or some hunch (??). (Such strange allegations (and the globally hysterical tone) might actually reflect the OP's lack of understanding.) Re the supposedly slam-dunk case of the CDF $F(x,y)=\frac12xy+\frac14(x+y)$ on $[0,1]^2$, it is not quite clear what the example should show in the OP's mind. As was already noted on the page by others, a distribution can be described by a CDF hence what does "to find distribution with a CDF" ... – Did Nov 04 '15 at 11:12
... (in English, "to deduce a distribution from a CDF") even mean, one wonders. To compute a PDF? Not in this case, since this measure has a singular part. Then what? Anyway, analyzing rigorously this CDF is not as difficult as the OP may think: since $F(x,0)=\frac14x$, a measure of total mass $\frac14$ is evenly distributed on the segment $[0,1]\times{0}$; likewise for the segment ${0}\times[0,1]$; and $\partial^2F/\partial x\partial y=\frac12$ on the open unit square hence a measure of total mass the remaining mass $\frac12$ is evenly distributed on the unit square. With formulas, ... – Did Nov 04 '15 at 11:13
... this reads $$P_{X,Y}(dx,dy)=\tfrac14\delta_0(dx)\mathbf 1_{(0,1)}(y)dy+\tfrac14\mathbf 1_{(0,1)}(x)dx\delta_0(dy)+\tfrac12\mathbf 1_{(0,1)}(x)\mathbf 1_{(0,1)}(y)dxdy.$$ In terms of test functions $g$, $$E(g(X,Y))=\tfrac14\int_0^1g(x,0)dx+\tfrac14\int_0^1g(0,y)dy+\tfrac12\int_0^1!!\int_0^1g(x,y)dxdy.$$ And, to get a "visual" understanding of this distribution (to me, the best representation of $P_{X,Y}$), note that the random couple $(X,Y)$ is $(U,V)$ with probability $\frac12$, is $(U,0)$ with probability $\frac14$, and is $(0,V)$ with probability $\frac14$, where $(U,V)$ is uniform ... – Did Nov 04 '15 at 11:14
... on the unit square. To sum up, measure theory works, as this cute little exercise again shows--and now what? – Did Nov 04 '15 at 11:15
@Did I did not delete the comments - I assumed that you did since you have more privileges. CDF - Cumulative Distribution Function - is an cumulative/integral presentation of a distribution and distribution is a (distributional) derivative of its CDF - I hope this makes it obvious for you what one needs to do shift from one form of presentation to another. In the example I gave you, you didn't use the method you presented above but had to make ad-hoc substitutions to tease out the structure of the distribution. Simply (distributionally) differentiating the CDF requires no such guess work. – A.S. Nov 04 '15 at 11:34
"I hope this makes it obvious for you" Yes, sure. Please continue to make a fool of yourself. – Did Nov 04 '15 at 11:57
1

@A,S, You must realize, as I posted in my answer and comments, CDF and PDF are not two equivalently "forms" for random variables/vectors. Please figure this out before arguing with others. Other usage of terms, such as "cumulative/integral presentation" and "(distributional) derivative" etc are also imprecise or at least vague. – Zhanxiong Nov 04 '15 at 17:16
@Sol You must realize that I never asked you for a PDF - I asked you for a distribution with a given CDF. In other words, I asked you for a probability measure with that CDF. Please see https://en.wikipedia.org/wiki/Probability_distribution, esp. https://en.wikipedia.org/wiki/Probability_distribution#Cumulative_distribution_function and https://en.wikipedia.org/wiki/Probability_distribution#Delta-function_representation. Here is a rigorous definition of distributional derivative (of a CDF): https://en.wikipedia.org/wiki/Distribution_(mathematics)#Derivatives_of_distributions. Any questions? – A.S. Nov 04 '15 at 23:15
1

@Sol Rereading the exchanges above, I hypothetize that what we are witnessing might be the arch classical confusion between the notions of distribution à la Schwartz and of distribution in the measure theoretical sense. (This, plus the OP's arrogance.) – Did Nov 07 '15 at 08:25
@Did That does to seem to be the source of confusion - but not on my part. I repeatedly stated that differentiation of a CDF is in a distributional sense (which is more general than finding Radon-Nykodim derivative with respect to Lebesgue measure of a measure induced by a CDF). How else can you make sense of $u'=\delta$ (which nicely unifies treatment of continuous and discrete distributions)? – A.S. Nov 07 '15 at 12:43
1

To sum up, you deliberately adopted conventions from another domain of mathematics to solve this question from a given domain, never even mentioning the fact and insulting everyone who dared to use classical methods and definitions from the given domain. And, characteristically, when someone finally provides you with an explanation, you again thrash them. The only excuse (sort of) I can imagine for this behaviour is if you are very young and very brash. – Did Nov 07 '15 at 15:47
@Did First, I want to thank you for sharing your knowledge of probability here - I learnt a lot from our interactions and reading your older posts. But I see this interaction very differently. You commented on the method that was different from your favorite measure-theoretic approach. I clarified the meaning of differentiation in my answer, pointed out that it differs from your favorite approach yet you insisted that it's wrong. You figured out the source of your confusion and I clarified that this time there was no confusion on my part. Having said that, I'd like to bury the hatchet. – A.S. Nov 07 '15 at 19:08
1

@A.S. I see: the rest of the world is confused but you know better. That sums it up, I guess. – Did Nov 07 '15 at 21:47

Joint distribution of $(X,\min(X,Y))$ for $X$ and $Y$ i.i.d. uniform on $(0,1)$

2 Answers2