1

I guess I will not be the first programmer bumping into another variation of the Coupon Collector Problem: please forgive my ignorance!

Here is my understanding of the problem at hand:

Let $S$ be a set of $N$ different kinds of coupons.

Assume that the coupons are shipped into random packs of at most 1 coupon of each type: that is each pack is a sequence $a_1,...,a_N$ such as $0 \leqslant a_i \leqslant 1$.

Assume the frequency $f_i$ of these 2^N packs are known but may differ, $\sum_{i=1}^{2^N}{f_i}=1$

Assume a person wants to collect all the $g$ different kinds of coupons, by randomly picking packs.

I am interested in two quantities:

  1. What is the probability one person can collect all $N$ coupon kinds by drawing only 2 packs?
  2. What is the probability that two persons get the exact same final, possibly incomplete, collection by drawing 2 (possibly different) packs?

I have seen proofs of the batched version by inclusion/exclusion, (eg here) that I could possibly use and adapt, but -being a dumb programer- I would be very interested by a pointer to an existing solution to this exact (weighted) problem?

WaterFox
  • 139

1 Answers1

1

This isn’t actually a variation of the coupon collector’s, since you’re only ever drawing $2$ packs and not waiting to complete a collection.

The first probability is

$$ \sum_{i\cup j=[N]}f_if_j\;, $$

where the indices $i,j$ run over all subsets of $[N]=\{1,\ldots,N\}$.

The second probability is

$$ \sum_{k\subset[N]}\left(\sum_{i\cup j=k}f_if_j\right)^2\;. $$

Performing this sum directly would take $O\left(4^N\right)$ time. You can reduce that to $O\left(3^N\right)$ like this: Represent $i$ and $j$ as binary $N$-tuples with the values $0$ (“excluded”) and $1$ (“included”) and consider ternary $N$-tuples that can additionally have the value $2$ (“don’t care”). There are $3^N$ such tuples. For each such tuple $k$, define $f_k$ as the sum of the $f_i$ for all $i$ that match $k$ where $k$ is $0$ or $1$ (i.e. that exclude all the excluded coupons and include all the included coupons, irrespective of whether they include the “don’t care” coupons). You can efficiently compute these $f_k$ recursively, for each $2$ component adding the corresponding values for $0$ and $1$. Now for each $j$, instead of pairing it will all $i$, you only have to pair it with all $k$ that have “don’t care” where $j$ has a $1$, since those values don’t affect the union with $j$. If $j$ has $\ell$ $1$s, there are $2^{N-\ell}$ such $k$ to pair it with, and there are $\binom N\ell$ values of $j$ with $\ell$ $1$s, so the total number of pairs to add up is

$$ \sum_{\ell=0}^N\binom N\ell2^{N-\ell}=3^N\;. $$

I doubt this can be further reduced to $O\left(2^N\right)$, but I might be wrong.

joriki
  • 238,052
  • 1
    Ooooh nice. I know now that I was looking in vain ahaha I was indeed hoping there would be a more convenient way to evaluate those and also I trust you we can't really avoid these sums! Also thank you for precising that the coupon's collector problem HAS to be waiting for a complete pack :) – WaterFox Feb 10 '23 at 05:46
  • 1
    @WaterFox: Reading that you trust my unguarded remark that I doubt this can be done more efficiently caused me to think a bit more deeply about it :-) I edited the answer to include a way to compute the second probability in $O\left(3^N\right)$ instead of $O\left(4^N\right)$ time. (I wouldn't say that a problem definitely has to include waiting for a complete pack to count as a variation of the coupon collector's problem; there are also questions about partial collections – but I'd say it does have to include some element of making a variable amount of draws and not just $2$.) – joriki Feb 10 '23 at 07:10
  • 1
    Ooooh I will look into that, it looks much more like my earlier attempts! Should I now declare I deeeefinitely trust you, so you come up with an O(1) solution? ( •̀ᴗ•́ )و ̑̑ – WaterFox Feb 10 '23 at 08:56
  • Omg I am reading through, this is so smart to pair j with k rather than i !!! – WaterFox Feb 10 '23 at 09:02