1

This question was asked by a colleague, and I need help on how to handle it. Here's the setup. For context, this is about artificial intelligence models applied to sensor data.

  1. Let's start with $N$ sensors. From these sensors, $n$ different sensors are selected randomly.
  2. Some processing is done with data from selected sensors.
  3. A new selection of $n$ sensors is done randomly (previous ones can be selected again), and processing is done with this new set.

Now, let's say that this selection process is done $m$ times. What is the probability that every pair of sensors was selected at least once during this process?

That last part is the tricky one. For example, with 5 sensors and 3 sensors per selection, what is the probability that sensor pairs 1-2, 1-3, 1-4, ..., 3-5 and 4-5 were selected at least once after $m$ drawings?

I had a look at hypergeometric distribution, but I'm not sure if this is what I need.

I hope this is clear enough. Regards

Francois

Francois
  • 131
  • Assuming your numbers are large, some rough work can be done by computing the mean and the variance of the number of pairs that are missed. Both of those can be worked out via linearity of expectation. Of course, that's not enough to give an exact answer, but maybe it's good enough? – lulu Jan 30 '23 at 18:47

1 Answers1

1

First, note that if $n=1$ then the probability is $0$ since no pairs are ever selected. For $n$ greater than $1$, the trick is to consider how many pairs of sensors you have selected with your $n$ choices instead of how many sensors you selected. This is given by $\binom{n}{2}$ and in total there are $\binom{N}{2}$ pairs of sensors. At this point, you have a variation of the Coupon Collectors Problem where you can select in batches (also see here) with $\binom{N}{2}$ coupons in total and drawing $\binom{n}{2}$ coupons each time. Unfortunately, while this can be written down explicitly, it is ugly and hardly enlightening. If your looking for something practical it is probably more valuable to find some sort of heuristic that can be used.

QC_QAOA
  • 11,796