1

Hello all trying to do an estimation problem at work and wondering if I'm on the right track!

I'm running a study and its on the internet. I'm trying to determine how many people I need to show an advertisement to in order to have 80%-95% confidence that I reach half of those people. Here are the numbers: the population is 220,000,000 and the sample of people I'm trying to reach is 1000 people large so the question is how many advertisements will I need to show in order to be 80%-95% certain that I hit at least 500 of those 1000 people without replacement.

My first thought is that:

the sum of 1000/220,000,000 + 999/219,999,999 + 998/219,999,998..... = 0.2275003444% which is the probability of success of hitting all 1000 people without replacement

the sum of 500/220,000,000 + 499/219,999,999 + 498/219,999,998..... = 0.0569319906% which is the probability of hitting 500 of the 1000 people without replacement

I'm having trouble with the next step how do I estimate how many times I need to show an advertisement to those 220,000,000 people to ensure that with 80%-95% confidence that hit at least 500?

I cannot help but think this is now a binomial estimation problem and I need to set it equal to .80-.95 and solve for k? am I think right and if so have do you solve for k is that possible?

$(nCk)(p)^k (1-p)^{n-k} = .8$ solve for k

moku
  • 225
  • 2
  • 10
  • I'm not sure I follow your two formulas. if you want the "probability of success of hitting all $1000$ people without replacement", you should multiply those numbers, not add them. – Caleb Stanford May 09 '14 at 23:07
  • ah dang you are right... thanks for that! so that is essentially zero...hmmm – moku May 10 '14 at 01:30

1 Answers1

0

The population size of $220,000,000$ is irrelevant, the relevant population is the fixed sample/control group of $1000$.

On average, $\Bbb E(X)=1000p$ of these have seen the ad, with a variance of $\Bbb V(X)=1000p(1-p)$ according to a binomial distribution of the random variable $X\sim B(1000,p)$. To be absolutely precise, you would have to solve the the equation for the sum of the probabilites for $X=500,501,...,1000$ surpassing the target probability, $$ \sum_{k=500}^{1000}\binom{1000}{k}p^k(1-p)^{1000-k}=p_{target}\in[0.80,0.95]. $$ Since for $p=0.5$ this sum is $\approx0.5$ and for $p=1$ it is $1$, you can solve this via bisection or regula falsi or some kind of bracketed Newton in a small number of steps


However, the sample is large enough that you can approximate the binomial distribution $X\sim B(1000,p)$ by the normal distribution $X\sim N(1000p, 1000p(1-p))$ with the same expectation and variance. This can be related to a standard normal random variable $Y\sim N(0,1)$ via the linear transformation $$ X=1000p+\sqrt{1000p(1-p)}Y, \quad \text{or }Y=F(X,p)=\frac{X-1000p}{\sqrt{1000p(1-p)}}. $$ This approximation is good for $5\% < p < 95\%$.


Now you can work with the quantiles of the normal distribution. For $$ p_{target}=P(X\ge 500)=P(Y\ge -q)=P(-Y\le q)\in [80\%,95\%] $$ one needs $Y=F(X,p)\ge F(500,p) =-q$ with $q\in[0.85,1.65]$, where $q$ is the quantile of the standard normal distribution for the probability $p_{target}$, or $-q$ is the quantile for the probability $1-p_{target}$.


Obviously, $p>0.5$. So solve \begin{align} -q=F(500,p)=\frac{500-1000p}{\sqrt{1000p(1-p)}} &\iff 1000(1-2p)^2= q^2\,(1-(1-2p)^2)\\ &\iff (1000+q^2)(1-2p)^2= q^2\\ &\iff p= \frac12+\frac{|q|}{2\sqrt{1000+q^2}} \end{align} which gives a necessary coverage $p$ between $51,3\%$ and $52,6\%$


This computation gives you the cumulative probability that a person has seen one of the ads. For the extended problem, to get the required number of showings of the ad, the coverage estimation for a single showing is missing. If this were, for example $10\%$, then the probability to have seen at least one of $k$ showings is greater than $52\%$ (just another example value from the computed range) if $1−(1−10\%)^k≥52\%$, so you would have to calculate $$ k≥\log(0.48)/\log(0.9)=6.9662.... $$

Lutz Lehmann
  • 126,666
  • ah thank you for this! Can you explain to me why the population is irrelevant? Also i kind of get the converges in distribution to normal stuff, what do Y and X represent in this case? – moku May 11 '14 at 15:59
  • Because the event that you are considering only concerns the control group of the 1000 preselected people. And even if you were to select randomly 1000 people after the fact, the probability of at least 500 among them having seen the ad does not depend on the general population size. This only changes if the individual probability to have seen the ad is not independent of the general population, i.e., if you guarantee that for instance exactly 120,000,000 of them have seen the ad. – Lutz Lehmann May 11 '14 at 16:13
  • Ok I think I understand. So i'm now using Y as my pivot between the critical values that represent .8 and .95. I'm still a bit confused as to what Y X and p now represent. Y is the normal distribution of our bin(n,p) and X is a random variable in that distribution i.e the # of ad i'm looking for and p is the probability of success, correct? how do I get my probability of success? – moku May 11 '14 at 16:21
  • $X$ and $Y$ already are the normal approximations of the binomial random variables. In binomial variables, you would have to evaluate the sum $$\sum_{k=500}^{1000}\binom{1000}{k}p^k(1-p)^{1000-k}=p_{target}\in[0.80,0.95].$$ Since for $p=0.5$ this sum is $\approx0.5$ and for $p=1$ it is $1$, you can solve this via bisection or regula falsi or some kind of bracketed Newton in a small number of steps, but the gain in precision is likely not very important. – Lutz Lehmann May 11 '14 at 16:21
  • phew dunno if I can do that ha. 1.28sqrt(1000p(1-p))+1000p < x < 1.96sqrt(1000p(1-p))-1000p. That is my estimation for # of ads but what is p because i already used .80-.95 z-critical to define my confidence level? Really appreciating you help here! – moku May 11 '14 at 16:37
  • This computation gives you the cumulative probability that a person has seen one of the ads. For the extended problem, the coverage estimation for a single showing is missing. If this were, for example 10%, then the probability to have seen at least one of k showings is $1-(1-10%)^k\ge 52%$, so you would have to calculate $k\ge\log(0.48)/\log(0.9)=6.9662...$. – Lutz Lehmann May 11 '14 at 16:56