Here is a math problem I thought of:
- Suppose you have $N$ = 10 coins: Each coin has a 0.5 probability of landing on heads and a 0.5 probability of landing on tails
- Step 1: Each turn, I randomly select some of these 10 coins (i.e. generate a random integer between 0 and 10, and pick those many coins). Ex: Suppose my random number is 7, using Sampling Without Replacement, I select 7 out of the 10 coins (each coin has equal probability of being selected via Uniform Distribution, i.e. $p$ = 0.1)
- Step 2: I then flip the coins I selected and record the results
- I repeat Step 1 and Step 2 until at least one head has been observed in all 10 coins.
- How many turns are required on average for this to happen?
Part 1: I first tried to solve this problem via simulation using the R programming language (1 = heads, 0 = tails, -1 = not selected):
multi_coins_flip_until_first_head <- function(n) {
count_coins <- numeric(n)
head_seen_coins <- logical(n)
flips_coins <- vector("list", n)
while(any(!head_seen_coins)) {
selected_coins <- sample(seq_len(n), sample(seq_len(n), 1))
for (i in seq_len(n)) {
if (i %in% selected_coins) {
flip <- sample(c(1, 0), 1)
flips_coins[[i]] <- c(flips_coins[[i]], flip)
if (flip == 1 && !head_seen_coins[i]) {
head_seen_coins[i] <- TRUE
count_coins[i] <- length(flips_coins[[i]])
}
} else {
flips_coins[[i]] <- c(flips_coins[[i]], -1)
}
}
}
return(list("Number of Flips for First Head" = count_coins, "Sequence" = flips_coins))
}
The output of this function looks like this:
$`Number of Flips for First Head`
[1] 1 19 2 1 1 4 7 2 4 1
$Sequence
$Sequence[[1]]
[1] 1 0 -1 1 -1 0 0 1 -1 -1 -1 0 1 0 -1 0 0 -1 -1
$Sequence[[2]]
[1] 0 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 -1 -1 0 1
$Sequence[[3]]
[1] 0 1 -1 1 0 -1 0 -1 -1 1 -1 1 0 1 -1 -1 0 1 -1
$Sequence[[4]]
[1] 1 0 -1 -1 -1 -1 0 -1 1 -1 1 0 1 1 1 -1 0 -1 -1
$Sequence[[5]]
[1] 1 1 -1 1 -1 0 -1 -1 0 -1 -1 1 1 -1 -1 0 1 -1 -1
$Sequence[[6]]
[1] 0 -1 0 1 -1 0 -1 -1 1 1 -1 0 1 -1 -1 -1 0 -1 -1
$Sequence[[7]]
[1] -1 0 -1 0 -1 0 1 -1 -1 -1 -1 1 1 0 -1 -1 1 0 -1
$Sequence[[8]]
[1] 0 1 0 0 -1 1 0 -1 -1 1 0 1 1 0 -1 -1 -1 0 0
$Sequence[[9]]
[1] 0 0 -1 1 -1 1 -1 -1 -1 -1 0 0 -1 0 -1 -1 0 -1 -1
$Sequence[[10]]
[1] 1 0 -1 -1 -1 1 -1 -1 1 0 -1 0 0 -1 -1 -1 0 -1 -1
Part 2: Then, I simulated this function 1000 times and plotted the results - the average of these simulations should represent the average number of times required to see at least one head in all coins:
multi_coins_flip_until_first_head <- function(n) {
count_coins <- numeric(n)
head_seen_coins <- logical(n)
flips_coins <- vector("list", n)
while(any(!head_seen_coins)) {
selected_coins <- sample(seq_len(n), sample(seq_len(n), 1))
for (i in seq_len(n)) {
if (i %in% selected_coins) {
flip <- sample(c(1, 0), 1)
flips_coins[[i]] <- c(flips_coins[[i]], flip)
if (flip == 1 && !head_seen_coins[i]) {
head_seen_coins[i] <- TRUE
count_coins[i] <- length(flips_coins[[i]])
}
} else {
flips_coins[[i]] <- c(flips_coins[[i]], -1)
}
}
}
return(max(count_coins))
}
results <- replicate(1000, multi_coins_flip_until_first_head(10))
df <- data.frame(Iteration = 1:length(results), Output = results)
Part 3: Then, I plotted the results:
avg <- mean(results)
#[1] 6.904
ggplot(df, aes(x = Iteration, y = Output)) +
geom_line() +
geom_hline(yintercept = avg, color = "red", linetype = "dashed", size = 1) +
labs(title = "Number Of Flips Required To See At Least 1 Head In All Coins",
x = "Iteration",
y = "Number of Flips") + theme_bw()
My Question: (Provided my simulation is correct) Is there a way to come up with a mathematical formula to arrive at this same conclusion?
From a first glance, this problems seems to require combining the Binomial Distribution (result of each coin flip), the Multinomial Distribution (number of coins being chosen in each turn) and the Geometric Distribution (number of cumulative coin flips required for a condition to be met). I suppose a "probability tree" can be constructed that outlines each possibility ... but the tree and the resulting likelihood function will obviously be completely "intractable" .
Thus, for such problems, is simulation the only way to come up with an answer?
Thanks!