I have a very challenging problem that I cannot find a way to solve without python simulation.
Given a dataset of size X (very large number), we want to select H entries from X without replacement and the order doesn't matter. Then, repeat this process N times.
Selecting H from X follows a uniform distribution.
How can I estimate the total numbers of entries that would be selected multiple times?
I know that for a very large X, selection without replacement is not going to be that different from selection with replacement, so I tried modeling it using the binomial theorem, but I cannot wrap my head around how to start the calculation.