3

Given we are picking numbers from $[0, 2n - 1]$ with repetition, how many numbers should we need to pick on the average, such that some two picked numbers sum up to $2n$.

Here is a simplified version of the question. Given we are picking numbers from the set $S = \{0, 1, 2, 3\}$ with repetition. How many numbers should we pick on the average, such that some two picked numbers sum up to 4. Monte-Carlo simulation shows that the answer is 4.222. I have no idea how to compute this analytically.

(Case n = 1) Consider the case when we are pick numbers with repetition from $S = \{0, 1\}$. For the expected number of numbers to be picked until some two picked numbers sum up to 2. We can compute the expectation like this:

$E = \frac{1}{2}(1 + E) + \frac{1}{2}(\frac{1}{2}(2) + \frac{1}{2}(1 + E_1))$, where E is the desired expectation and $E_1$ is the expected wait time until you draw a 1. So $E_1 = 2$, and $E = 4$.

vamsikal
  • 105
  • Do you have any thoughts? – Patrick Stevens Mar 11 '17 at 06:42
  • I am drawing a complete blank. I have Monte-Carlo code that gives the approximate answers, but I have no way of coming up with the answer analytically. – vamsikal Mar 11 '17 at 06:45
  • @vamsikal: For a given n. I can get the exact answers with a program. Do you have any reason to believe that there's a closed form? – quasi Mar 11 '17 at 06:47
  • I am not sure whether there is a closed-form. But I would be very interested in the algorithm for computing the correct answer for a given n. – vamsikal Mar 11 '17 at 06:51
  • @vamsikal: I'll post my code as an answer in a few minutes. – quasi Mar 11 '17 at 06:58
  • This might be conceptually easier when phrased as "pick numbers until two are the same". (This problem is equivalent because each number is looking for exactly one partner to add up to $2n$; why not instead make the partner be itself?) – Patrick Stevens Mar 11 '17 at 06:58
  • 1
    Ah, then it's the birthday problem. – quasi Mar 11 '17 at 06:59
  • This sounds similar to this question: http://math.stackexchange.com/questions/214399/summing-0-1-uniform-random-variables-up-to-1 The difference here is that you sum just two numbers instead of all. – Thanassis Mar 11 '17 at 06:59
  • Oh, they have to sum up exactly to 2N, not equal or greater than 2N. Then this becomes a variant of the birthday problem. – Thanassis Mar 11 '17 at 07:01
  • 2
    Zero has no partner though, which is slowing down my formulation of a recurrence relation – Grant B. Mar 11 '17 at 07:02
  • @Patrick Stevens: This problem is very similar, but not the same as birthday problem. Note, picking until two numbers are same is not correct, since picking two zeros does not correspond to sum being $2n$. – vamsikal Mar 11 '17 at 07:03
  • Oh :( is there not an obvious place to subtract $\frac{1}{2n}$ to account for $0$ basically being a non-pick? – Patrick Stevens Mar 11 '17 at 07:06
  • Let the probability of success in a trial is $p$. Then the expected value of trials to first success is $E = 1/p$. https://en.wikipedia.org/wiki/Geometric_distribution – antimatr0id Mar 11 '17 at 07:50
  • So, now we compute $p$. The number of possible outcomes is $2n \choose 2$ and the number of "success" outcomes is $n$(there are only n pairs which sum up to 2n). – antimatr0id Mar 11 '17 at 07:54
  • @antimatr0id: Geometric distribution shows up when $p$ is a constant. Here, the probability of success is dependent on the trial number. – vamsikal Mar 11 '17 at 07:55
  • @vamsikal Sorry, I misread the question :( – antimatr0id Mar 11 '17 at 07:58

3 Answers3

1

Here's my revised implementation in Maple, hopefully now correct. enter image description here

quasi
  • 58,772
1

Call a sequence of numbers from $[0,2n-1]$ completed if it contains two numbers that add to $2n$. If the earliest occurrence of two such numbers are at $n,m$ in the sequence, we say the sequence is completed at $m$ (the location of the second number). Call $a_m$ the number of sequences of length $m$ completed at $m$. $a_m$ is the total number of sequences of length $m$, minus the number of all sequences which had earlier completions, minus the number of incomplete sequences of length $m$ (call this $I_m$). That is, $$a_m = (2n)^{m}-\left(\sum_{i=1}^{m-1} a_i (2n)^{m-i}\right) \ \ - I_m$$ since the number of sequences of length $m$ completed at $n$ is $a_n (2n)^{m-n}$ as we are free to choose whatever we like for the remaining numbers after completing.

An incomplete sequence is one that contains no pair of numbers that add to $2n$. Let us first consider incomplete sequences which do not contain the numbers $0$ or $n$ within them. In that case, we can have as many of each number as we wish within the sequence, as long as we do not have any of its pair. Let us fix the number of different digits we will have within our sequence, some $r$ with $1\leq r \leq n-1$. (If we have more than $n-1$ different numbers we will be including a pair and this will be a complete sequence, or be using $0$ or $n$.) We will choose these $r$ digits from the list $[1,n-1]$, giving us ${n-1}\choose r$ different sets of digits to use. But for each of these digits we can also choose to use its pair instead, so we actually have $2^r {{n-1}\choose{r}}$ different sets of digits to use. We just need to multiply this by the number of different sequences that use $r$ different digits and sum over $r$ to find the incomplete sequences not containing $0$ or $n$.

Now let $b_r^{(m)}$ denote the number of sequences of length $m$ that use $r$ different digits at least once. For instance, $b_1^{(m)}=1$ because there is only one sequence of length $m$ you can make with one digit, and $b_r^{(1)}=1$ if $r=1$ and $0$ if $r>1$, since there is no way to make a sequence of length $1$ using $2$ or more different digits at least once. Now let's find $b_r^{(m)}$. $b_0^{(m)}=0$ since with no digits you can not make a sequence. Now $b_r^{(m)}$ is the total number of sequences you can make from $r$ digits, minus the sequences that use less digits: fix a number $i$, $1\leq i \leq r-1$. Then we have counted the sequences formed by using only $i$ of our $r$ digits which is ${r \choose i}b_i$ (choose which digits are used in the sequence, and find how many sequences use just those digits); thus, we must subtract this off of the total number of sequences. Therefore, $$b_r^{(m)}=r^m-\sum_{i=1}^{r-1}{r \choose i}b_i^{(m)}$$ Using this recurrence, we can show (by induction or generating functions) that $$b_r^{(m)}=\sum_{i=0}^r {r\choose i}(-1)^{r+i}i^m$$ considering $0^0$ in this case to be $1$ so that $b_r^{(0)}=0$.

Then call the number of incomplete sequences of length $m$ not containing $0$ or $n$, $c_m$. From our above reasoning, $$c_m=\sum_{r=1}^{n-1}2^r {{n-1}\choose{r}}b_r^{(m)}$$.

The rest is easy. Call the number of incomplete sequences of length $m$, now possibly containing $0$ but still not $n$, $C_m$. Then for each $i$, $0\leq i \leq m-1$ we can have $i$ zeros (plus the sequence of all zeros for $i=m$). We can choose to put them in ${m \choose i}$ positions, and we are left with an incomplete sequence containing no $0$s of length $m-i$. Thus $$C_m = 1+ \sum_{i=0}^{m-1}{m\choose i}c_{m-i} = 1+ \sum_{i=1}^{m}{m\choose i}c_{i}$$ And finally we can introduce a single $n$ into the sequence in at most one of the $m$ positions in the sequence, so $$I_m = C_m + m C_{m-1}$$ Now we have a recurrence for $a_m$, the number of sequences of length $m$ that are completed at the $m$th digit. Therefore $a_m/(2n)^m$ is the probability that we will first see a pair adding to $2n$ after picking $m$ digits. Thus the expected number of digits we will have to pick is $$\sum_{m=1}^\infty m \frac{a_m}{(2n)^m}$$ In terms of the generating function $A(x)=\sum_{m=1}^\infty a_m x^m$, the probability is $A'(1/2n)/2n$. I will look into finding closed forms tomorrow (spent my whole day on this already), but the problem is solved once a generating function or closed form is found for $I_m$.

Edit: The exponential generating function $\sum_{m=0}^\infty b_r^{(m)}\frac{x^m}{m!}=(e^x-1)^r$. From this we find the egf of $(I_m)$ is $$(1+x)e^x[2e^x-1]^{n-1} = (1+x)\sum_{k=1}^{n}{{n-1}\choose {k-1}}(-1)^{n-k}2^{k-1} e^{kx}$$ Then we use Laplace transforms to find the ordinary generating function $$I(s) = \sum_{m=0}^\infty I_m s^m = \sum_{k=1}^n{{n-1}\choose{k-1}}(-1)^{n-k}2^{k-1}\left[\frac{1}{1-sk}+\frac{s}{(1-sk)^2}\right]$$ and if we let $s=x/(2n)$ we get $$I(x) = n\sum_{k=1}^n{{n-1}\choose{k-1}}(-1)^{n-k}2^{k}\left[\frac{1}{2n-xk}+\frac{x}{(2n-xk)^2}\right]$$ From our initial definitions we find $$P(x)=\sum_{m=0}^\infty \frac{a_m}{(2n)^m}x^m=1-(1-x)I(x)$$ Then our expected value, in closed form, is $$E(n)=P'(1)=n\sum_{k=1}^n{{n-1}\choose{k-1}}(-1)^{n-k}2^{k}\left[\frac{1}{2n-k}+\frac{1}{(2n-k)^2}\right]$$ $E(1)=4$ and $E(2)=\frac{38}{9}=4.222$, as expected. We also find $E(3)=\frac{691}{150}=4.607$, $E(4)=\frac{18328}{3675}=4.987$, and $E(5)=\frac{424367}{79380}=5.346$.

Grant B.
  • 702
  • 6
  • 14
1

Let $E$ be the expected number of numbers drawn from $[0, 2n - 1]$ until two of the picked numbers sum up to $2n$. Let $E_{x,y}$ be the expected number of numbers to be drawn until two of the picked numbers sum up to $2n$, where $x$ is the number of distinct numbers seen so far belonging to the set $S = \{1, 2, 3, ..., (n - 1), (n + 1), ..., (2n - 1)\}$. And, $y \in \{1, 2, 3, 4\}$ where 1 - no $0$ is seen, no $n$ is seen, 2 - $0$ is seen, no $n$ is seen, 3 - no $0$ is seen, $n$ is seen, 4 - $0$ is seen, $n$ is seen.

Then,

$$E = E_{0,1}$$

where,

$$E_{i,1} = \frac{1}{2n}(1 + E_{i,2}) + \frac{1}{2n}(1 + E_{i,3}) + \frac{i}{2n}(1 + E_{i,1}) + \frac{i}{2n}(1) + \frac{2n - 2i - 2}{2n}(1 + E_{i+1,1})$$ $$E_{i,2} = \frac{1}{2n}(1 + E_{i,2}) + \frac{1}{2n}(1 + E_{i,4}) + \frac{i}{2n}(1 + E_{i,2}) + \frac{i}{2n}(1) + \frac{2n - 2i - 2}{2n}(1 + E_{i+1,2})$$ $$E_{i,3} = \frac{1}{2n}(1 + E_{i,4}) + \frac{1}{2n}(1) + \frac{i}{2n}(1 + E_{i,3}) + \frac{i}{2n}(1) + \frac{2n - 2i - 2}{2n}(1 + E_{i+1,3})$$ $$E_{i,4} = \frac{1}{2n}(1 + E_{i,4}) + \frac{1}{2n}(1) + \frac{i}{2n}(1 + E_{i,4}) + \frac{i}{2n}(1) + \frac{2n - 2i - 2}{2n}(1 + E_{i+1,4})$$

The base cases are $E_{n,1} = 0, E_{n,2} = E_{n,3} = 0, E_{n,4} = 0$.

To solve this problem, all we need to do is to setup these equations and solve them. The equations can be solved simply by back-substitution. I have manually verified the answers for $n = 1$, $n = 2$ and $n = 3$.

vamsikal
  • 105