Picking two random real numbers between 0 and 1, why isn't the probability that the first is greater than the second exactly 50%?

Question

I attempted to answer this question on Quora, and was told that I am thinking about the problem incorrectly. The question was:

Two distinct real numbers between 0 and 1 are written on two sheets of paper. You have to select one of the sheets randomly and declare whether the number you see is the biggest or smallest of the two. How can one expect to be correct more than half the times you play this game?

My answer was that it was impossible, as the probability should always be 50% for the following reason:

You can't! Here's why:

The set of real numbers between (0, 1) is known as an Uncountably Infinite Set (https://en.wikipedia.org/wiki/Uncountable_set). A set that is uncountable has the following interesting property:

Let $\mathbb{S}$ be an uncountably infinite set. Let, $a, b, c, d \in \mathbb{S} (a \neq b, c \neq d)$. If $x$ is an uncountably infinite subset of $\mathbb{S}$, containing all elements in $\mathbb{S}$ on the interval $(a, b)$; and $y$ is another uncountably infinite subset of $\mathbb{S}$, which contains all elements of $\mathbb{S}$ on the interval $(c, d),$ $x$ and $y$ have the same cardinality (size)!

So for example, the set of all real numbers between (0, 1) is actually the exact same size as the set of all real numbers between (0, 2)! It is also the same size as the set of all real numbers between (0, 0.00001). In fact, if you have an uncountably infinite set on the interval $(a, b)$, and $a<n<b$, then then exactly 50% of the numbers in the set are greater than $n$, and 50% are less than $n$, no matter what you choose for $n$. This is important because it tells us something unintuitive about our probability in this case. Let's say the first number you picked is 0.03. You might think "Well, 97% of the other possible numbers are larger than this, so the other number is probably larger." You would be wrong! There are actually exactly as many numbers between (0, 0.03) as there are between (0.03, 1). Even if you picked 0.03, half of the other possible numbers are smaller than it, and half of the other possible numbers are larger than it. This means there is still a 50% probability that the other number is larger, and a 50% probability that it is smaller!

"But how can that be?" you ask, "why isn't $\frac{a-b}{2}$ the midpoint?"

The real question is, why is it that we believe that $\frac{a-b}{2}$ is the midpoint to begin with? The reason is probably the following: it seems to make the most sense for discrete (finite/countably infinite) sets. For example, if instead of the real numbers, we took the set of all multiples of $0.001$ on the interval $[0, 1]$. Now it makes sense to say that 0.5 is the midpoint, as we know that the number of numbers below 0.5 is equal to the number of numbers above 0.5. If we were to try to say that the midpoint is 0.4, we would find that there are now more numbers above 0.4 then there are below 0.4. This no longer applies when talking about the set of all real numbers $\mathbb{R}$. Strangely enough, we can no longer talk about having a midpoint in $\mathbb{R}$, because every number in $\mathbb{R}$ could be considered a midpoint. For any point in $\mathbb{R}$, the numbers above it and the numbers below it always have the same cardinality.

See the Wikipedia article on Cardinality of the continuum (https://en.wikipedia.org/wiki/Cardinality_of_the_continuum).

My question is, from a mathematical point of view, is this correct? The person who told me that this is wrong is fairly well known, and not someone who I would assume to often be wrong, especially for these types of problems.

The reasoning given for my answer being wrong was as follows:

Your conclusion is not correct.
You're right that the set of real numbers between 0 and 1 is uncountable infinite, and most of what you said here is correct. But that last part is incorrect. If you picked a random real number between 0 and 1, the number does have a 97% chance of being above 0.03. Let's look at this another way. Let K = {all integers divisible by 125423423}. Let M = {all integers not divisible by 125423423}. K and M are the same size, right? Does this mean, if you picked an random integer, it has a 50% chance of being in K and a 50% chance or not? A random integer has a 50% chance of being divisible by 125423423?

The reason I disagreed with this response was because the last sentence should actually be true. If the set of all numbers that are divisible by 125423423 is the same size as the set of numbers that aren't, there should be a 50% probability of picking a random number from the first set, and a 50% chance that a number would be picked from the second. This is cirtainly the case with finite sets. If there are 2 disjoint, finite sets with equal cardinality, and you choose a random number from the union of the two sets, there should be a 50% chance that the number came from the first set, and a 50% chance that the number came from the second set. Can this idea be generalized for infinite sets of equal cardinality?

Is my answer wrong? If so, am I missing something about how cardinalities of two set relate to the probability of choosing a number from one of them? Where did I go wrong in my logic?

You are missing a Jacobian in the transformation of (continuous) variables. Density of $U(0,1)$ is twice that of $U(0,2)$ - so while each particular $x$ has zero probability in both cases, these zeros are of different orders - one is twice as large as the other. — A.S., Dec 24 '15 at 00:35
http://math.stackexchange.com/questions/655972/help-rules-of-a-game-whose-details-i-dont-remember/656426#656426 describes a strategy that will win strictly more than half the time. — MJD, Dec 24 '15 at 00:36
Your reasoning about probability using cardinality leads to a paradox. If you select a number uniformly from (0,1], then you are just as likely to select a number from (1/2, 1] as (1/4, 1/2] as (1/8,1/4] as (1/16, 1/8], ..., and so on. So what is the probability of each of these events? — panofsteel, Dec 24 '15 at 00:42
Cardinality is a red herring. If the number I see is "0.999", I have amazing confidence I'm looking at the larger of the two numbers. — Eric Towers, Dec 24 '15 at 01:26
@EricTowers Your confidence assumes an unskilled player. See my answer. — Eugene Ryabtsev, Dec 24 '15 at 05:08
You cannot mix and match Discrete Probability Distribution with Continuous Probability Distribution. Let me ask you these two questions:

What is the probability that a randomly chosen number between 0 and 1 is exactly 0.278?

What is the probability that a randomly chosen number between 0 and 1 lies in the range 0.45 to 0.60? — Masked Man, Dec 24 '15 at 06:07
Your title asks a different question from that you quote. Wording makes all the difference in statistics problems. This may be part of the reason you're confused. — Carl Witthoft, Dec 24 '15 at 12:04
This question must be either edited, or deleted. The title and the statement of the question in the body are totally unrelated. (Do not mention that additionally, the very extended discussion is possibly again unrelated to either of those.) The following should be done: (1) eliminate the extended discussion as it only adds confusion (2) state the problem itself clearly (3) change the title to express the problem (or just make it a general title like "Problem regarding random numbers"). — Fattie, Dec 24 '15 at 14:58
@JoeBlow : Agree. Title claims numbers are chosen at random. Problem statement does not. Answers are all over the range of interpretations on this axis of ambiguity. Other defects are present as well. — Eric Towers, Dec 24 '15 at 21:20
I also agree that the title is disjoint from the body of the question, and needs to be edited. — Daniel R. Collins, Dec 24 '15 at 21:39
@WorBlux What part of the question assumes a fixed font and paper size? — David Richerby, Dec 25 '15 at 17:32

Milo Brandt · Accepted Answer · 2015-12-26T21:55:09.527

Yes, your answer is fundamentally wrong. Let me point at that it is not even right in the finite case. In particular, you are using the following false axiom:

If two sets of outcomes are equally large, they are equally probable.

However, this is wrong even if we have just two events. For a somewhat real life example, consider some random variable $X$ which is $1$ if I will get married exactly a year from today and which is $0$ otherwise. Now, clearly the sets $\{1\}$ and $\{0\}$ are equally large, each having one element. However, $0$ is far more likely than $1$, although they are both possible outcomes.

The point here is probability is not defined from cardinality. It is, in fact, a separate definition. The mathematical definition for probability goes something like this:

To discuss probability, we start with a set of possible outcomes. Then, we give a function $\mu$ which takes in a subset of the outcomes and tells us how likely they are.

One puts various conditions on $\mu$ to make sure it makes sense, but nowhere do we link it to cardinality. As an example, in the previous example with outcomes $0$ and $1$ which are not equally likely, one might have $\mu$ defined something like: $$\mu(\{\})=0$$ $$\mu(\{0\})=\frac{9999}{10000}$$ $$\mu(\{1\})=\frac{1}{10000}$$ $$\mu(\{0,1\})=1$$ which has nothing to do with the portion of the set of outcomes, which would be represented by the function $\mu'(S)=\frac{|S|}2$.

In general, your discussion of cardinality is correct, but it is irrelevant. Moreover, the conclusions you draw are inconsistent. The sets $(0,1)$ and $(0,\frac{1}2]$ and $(\frac{1}2,1)$ are pairwise equally large, so your reasoning says they are equally probable. However, the number was defined to be in $(0,1)$ so we're saying all the probabilities are $1$ - so we're saying that we're certain that the result will be in two disjoint intervals. This never happens, yet your method predicts that it always happens.

On a somewhat different note, but related in the big picture, you talk about "uncountably infinite sets" having the property that any non-trivial interval is also uncountable. This is true of $\mathbb R$, but not all uncountable subsets - like $(-\infty,-1]\cup \{0\} \cup [1,\infty)$ has that the interval $(-1,1)=\{0\}$ which is not uncountably infinite. Worse, not all uncountable sets have an intrinsic notion of ordering - how, for instance, do you order the set of subsets of natural numbers? The problem is not that there's no answer, but that there are many conflicting answers to that.

I think, maybe, the big thing to think about here is that sets really don't have a lot of structure. Mathematicians add more structure to sets, like probability measures $\mu$ or orders, and these fundamentally change their nature. Though bare sets have counterintuitive results with sets containing equally large copies of themselves, these don't necessarily translate when more structure is added.

Tangent: this is why Statistical Mechanics stumped me in university. (To elaborate: it starts with exactly this postulate, that every microstate has the same probability (given some conditions).) — TLW, Dec 25 '15 at 13:22
@Milo Brandt: which book contains this type of problems? Can you recommend anything ? I really don't know! — Chinnapparaj R, Dec 24 '18 at 05:10

score 42 · Answer 2 · edited Apr 13 '17 at 12:19

The OP's answer is incorrect. The numbers are not chosen based on cardinality, but based on measure. It is not possible to define a probability distribution using cardinality (on an infinite set). However it is possible using measure.

Although the problem doesn't specify, if we assume the uniform distribution on $[0,1]$, then if $x=0.03$ then $y$ will be greater than $x$ 97% of the time. Of course, if a different probability distribution is used to select $x,y$, then a different answer will arise. It turns out that it is possible to win more than half the time even NOT KNOWING the distribution used, see this amazing result here.

Eugene Ryabtsev · Answer 3 · 2015-12-24T05:17:47.263

Two distinct real numbers between 0 and 1 are written on two sheets of paper.

In the beginning, nothing is known about the choosing-and-writing process other than the range of possible values.

You have to select one of the sheets randomly and declare whether the number you see is the biggest or smallest of the two.

But one can see the number before making a statement.

How can one expect to be correct more than half the times you play this game?

One can observe the way the game is played and try to model the algorithm used to choose the numbers and use that model when making a decision.

If the algorithm seems to be uncorelated random uniform distribution and the number seen is less than 0.5, than declare it to be the least of the two. This is easy to beat.
If the algorithm seems to be picking some number at random, writhing it to Nth place where N >> 1 and then writing second number differing by 1 in the last place, then it is largely irrelevant what to declare. This is impossible to beat.
If the algorithm is to actively confuse you about what the algorithm is, things might get really interesting.

The original question is not about math so much as it is about game theory. In this game, any side can choose to play to a tie and be successful at that. Otherwise, it's a contest of skill (though it seems there is no winning strategy for side #1 other than persuading side #2 into other than 2-to-1 payoff structure).

score 2 · Answer 4 · answered Dec 24 '15 at 00:28

2

Here's my thinking:

Suppose the first random number is $x$, $0 \le x \le 1$.

If $y$ is the second random number, the probability that $y \le x$ is $x$.

The over probability is $\int_0^1 x\,dx =\frac{x^2}{2}\big|_0^1 =\frac12 $.

answered Dec 24 '15 at 00:28

marty cohen

107,799

2

So we assume a uniform distribution. (Not stated in the problem.) However, this calculation is still not what is in the problem. This calculation is the probability that $X \ge Y$. But what we really want is the conditional probability that $X \ge Y$ given $Y$. Might that answer not depend on $Y$? And thus enable us to make a guess that is correct better than $1/2$ the time? – GEdgar Dec 30 '15 at 14:07

score 2 · Answer 5 · answered Dec 24 '15 at 00:37

2

A. on half the occasions one of the numbers will be greater than $\frac12$ and the other will be less than $\frac12$.

B. on a quarter of occasions both numbers will be less than $\frac12$

C. on the remaining quarter of occasions both will be greater than $\frac12$

if you adopt the strategy of settling for a number when it is greater than $\frac12$ then you will be right every time in case A. in cases B and C you will be right half of the time.

thus your probability of success overall is $\frac34$

answered Dec 24 '15 at 00:37

David Holden

18,040

4

This assumes the original two numbers were uniformly distributed. However that isn't guaranteed. The original wording only says this: "Two distinct real numbers between 0 and 1 are written on two sheets of paper." – kasperd Dec 24 '15 at 11:24

score 1 · Answer 6 · answered Dec 24 '15 at 22:14

Here's my interpretation of the question being asked: (1) Ignore the title question and attend only to the text question, which is currently different. (2) Assume that two numbers are chosen at random from a uniform distribution on $(0, 1)$. (3) Assume that the player can see the first number before making a guess about the relation to the second.

Granted that, then the OP's first error in reasoning is this. Basically you've mixed up reasoning between discrete (finite) and continuous (infinite) situations:

In fact, if you have an uncountably infinite set on the interval $(a,b)$, and $a<n<b$, then exactly 50% of the numbers in the set are greater than $n$, and 50% are less than $n$, no matter what you choose for $n$.

This reasoning would be true for finite sets. Look at the standard definition of "proportion" (e.g. $50\%$): $p = x/N$, where $x$ is the number of elements with some characteristic, and $N$ is the total size of the set. Then if $n$ elements have the characteristic, and $n$ likewise do not have the characteristic, then $p = n/(2n) = 1/2 = 50\%$.

But this calculation is undefined for infinite-size sets! You might overlook the assumption that $x$ and $N$ must be natural numbers, not some transfinite values. In order to begin discussing relations in sizes of sets, you must make use of measure theory, which is missing from the OP's observations. It follows that the classical definition of probability by proportions (Fermat, Pascal) is likewise only legitimate for finite sample spaces. Defining probability distributions on infinite sample spaces can only be done through use of integrals.

Consider carefully the definition of the continuous uniform probability distribution. The cumulative distribution function is found by integration of the probability density function (from Wikipedia):

For the OP, $a = 0$ and $b = 1$ (i.e., standard uniform), so $f(x) = 1$ and $F(x) = x$ on the domain of support. So for example, if $x = 0.03$, then the probability that a second random number from this uniform distribution is less than $x$ is $F(0.03) = 0.03 = 3\%$, not $50\%$.

score 0 · Answer 7 · answered Dec 26 '15 at 22:38

First and most important thing is that it is not possible to select a random number from an infinite set and with that from an uncountable set.

In that sense, a random real number is an abstraction. We technically always take some sort of small intervals and measure the size like the length, instead of counting.

In math, we take the limit of some selection process, but even with that we never select one real number, we just make them all measured or reached by the same process, proving that the limiting process of reaching any of them is equivalent.

For example, selecting a real number would require defining all possible sequences of some sort. Assume we are dealing with decimal notation. So we say: we pick a real number by taking a sequence of numbers from 0 to 9, always selection a digit with the probability of 1/10. This is abstraction, since we cannot do that to infinity.

Still taking it from your perspective, you are saying that for each number in the interval (0, 0.3), we have a match from the interval [0.3, 1). Indeed we have 1:1 mapping. However, the process of abstract selection must be a little bit different: you need to select a number from 0 to 3 for the first digit and from 4 to 9 for another group. If we make this abstract selection in sort of uniform way, we should better say that we need to pick the same number of digits for each interval, one digit in both, then two digits in both and so on. Otherwise, we cannot compare the intervals. You can see immediately that we have the probability of $\frac{1}{4}$ for the first and $\frac{1}{6}$ for the second interval. The second digit can be any from 0 to 9 for both intervals so the probabilities are $\frac{1}{4} \cdot \frac{1}{9}$, and $\frac{1}{6} \cdot \frac{1}{9}$ respectively. After n digits we have $\frac{1}{4} \cdot (\frac{1}{9})^{n-1}$, and $\frac{1}{6} \cdot (\frac{1}{9})^{n-1}$. Although both probabilities tend to 0, they are never the same. The ratio for any n, and with that the limit itself, is always $\frac{4}{6}=\frac{2}{3}$, which means that the probabilities are the solution of $\frac{x}{y}=\frac{2}{3}$ and $x+y=1$, obviously $x \neq y$

As you can see you have to define a selection process and observe its limiting behavior, although in reality you cannot effectively execute any of it. The reason is the same as with summation where we cannot sum infinite series of numbers but we can reach the sum itself nevertheless.

This is the only way of talking about selection process with an infinite set. You can invent your own selection process, the only requirement is that each element must be indistinguishable within the process itself even at infinity.

score 0 · Answer 8 · answered Mar 05 '17 at 14:52

Either before or after I see the exposed number, I choose a number randomly between 0 and 1. If my number is greater than the exposed number, I guess that the exposed number is Low; if my number is smaller, I guess that the exposed number is High. This strategy wins at least 50% of the time, perhaps more.

Let the two numbers generated for me to guess be xL and xH; I see one of them, but do not know which. The two numbers divide the unit interval into three parts: (0,xL), (xL,xH),(xH,1). My number (generated without consideration of what my opponent has chosen) falls into these regions with probability equal to the size of the region. If my number is in either outside region, I'm just guessing and get probability 1/2. If my number falls inside the region, I win. If the opponent is restricted from throwing equal numbers, my strategy has a strictly greater than 50% chance to win. This is independent of any method chosen by the opponent as long as the opponent doesn't know my number (which I generate randomly). Apparently this strategy beats Zeus but not necessarily Apollo.

Musing: suppose the two numbers are today's and tomorrow's stock prices of a particular stock. Can I win by pairing sells and buys according to the above strategy? I simulated this once using stock prices moving according a multiplicative random walk; the strategy won about 2/3 of the time (with 2% vigorish subtracted from each transaction for expenses.)

score -1 · Answer 9 · edited Jun 12 '20 at 10:38

Two distinct real numbers between 0 and 1 are written on two sheets of paper. You have to select one of the sheets randomly and declare whether the number you see is the biggest or smallest of the two. How can one expect to be correct more than half the times you play this game?

If you don't know the values on the sheets

The probability that you pick the smaller is 50%, as you only have two choices - the pieces of paper.

The probability that you pick the larger is 50%, for the same reason.

As your choice is random ( you cannot see the sheets of paper or they are shuffled ), you will always average 50% of one sheet and 50% of the other.

If you can see the values you pick...

Once you can see the values, even if only one at a time, you can adapt your strategy.

If you pick a value $v$ you can weigh the possibility that it is the lowest or highest. Over many different games ( but not rounds in the same game ) you will do better than 50% because on average you'll be able to pick $max( v, 1-v )$ right.

But over many rounds you will, on average see both values. As you will know both values you can, as soon as both become known, always say with 100% accuracy that your choice is the largest or smallest. As the number of rounds grows, your asymptotic probability of being right tends to 100% in this scenario.

If the "dealer" changes the number on each sheet every round and you do not known the in advance, then you are back to a 50-50 chance.

If you know the values, but cannot see the value of each selection you made

Your chances of guessing whether the unseen value of any single choice is the highest or lowest remains 50%, as you have only two choices.

So this scenario gets you no advantage.

gnasher729 · Answer 10 · 2016-01-24T18:39:23.480

If the numbers were written at random, then the probability that the number on the first sheet is larger is exactly 50%. But if you learn which number is written on the first sheet the situation changes. 0.9 written on the first sheet has a chance of 90% to be the larger number. 0.1 only has a 10 percent chance. If you think it is strange that knowing about the number changes the probability then consider what happens if you know both numbers: If you know that the first sheet has the number 0.21 written on it and the second sheet has the number 0.19, then the probability is now 100% that the first sheet has a larger number.

So unless the first number is 0.50, you can make a guess about which sheet contains the larger number that has a chance greater than 50% to be correct. Guess the first number is larger if it is greater than 0.5. Guess the second number is larger if the first is smaller than 0.5. If the first number is 0.5 you guess the first number is larger.

Interesting what happens if you were told the numbers are random, but in fact they are chosen by an opponent who tries to make you make the wrong choice, and you follow the strategy above. If the opponent choses one number < 0.5 and one number ≥ 0.5 then no matter what sheet you choose, you win. If the opponent picked both numbers ≥ 0.5 then you will guess that the sheet you picked has the larger number; the chance is 50% that you picked the sheet that made your guess right. Same for picking two numbers < 0.5.

If the opponent sometimes picks two numbers on opposite sides of 0.5 you are winning. If the opponent doesn't, and assuming you are immediately told whether you won or lost, you will figure out that you are not winning as much as you thought you should. You could then do some statistical analysis how the numbers on one sheet are distributed, separately for the case that you are winning and that you are losing, and make the occasional guess that a number ≥ 0.5 is not the larger, or a number < 0.5 is not the smaller one.

Picking two random real numbers between 0 and 1, why isn't the probability that the first is greater than the second exactly 50%?

10 Answers10

If you don't know the values on the sheets

If you can see the values you pick...

If you know the values, but cannot see the value of each selection you made