1

Would appreciate some help with a question. Before anyone asks, it's not homework and I'm not a professional maths person, only someone who is somewhat inquisitive!

I have a series of probabilities related to "events" or "incidents" that may occur on a particular day.

$P(n)$ is the probability of "n" incidents occurring on a particular day.

$P(0)=0.55, P(1)=0.28, P(2)=0.15, P(3)=0.01, P(4)=0.01$

I would like to calculate the probability of more than a total of "x" incidents occurring over a period of "y" days.

For instance, what is the probability of more than 20 incidents happening over a period of 30 days?

I have no clue how I would go about doing this. Can anyone suggest a method?

Mark
  • 143
  • Welcome to math.SE. Here's a tutorial and reference for typesetting math on this site. – joriki Mar 01 '20 at 08:43
  • 2
    is that better? – Mark Mar 01 '20 at 08:46
  • Do you have the source to the original problem? – Someone Mar 01 '20 at 08:48
  • that is the original problem.... – Mark Mar 01 '20 at 08:53
  • I have added an example, if that helps. – Mark Mar 01 '20 at 08:58
  • Yes, much better :-). You could also replaced the quoted variable names "x" by italicized variable names $x$. On the content: As Mick says, it's a large sum for which you'll probably need a computer; but for parameters like $20$ and $30$ in your example, you can get a good approximation by approximating the distribution by a Gaussian distribution with the same mean and variance. – joriki Mar 01 '20 at 09:12
  • Could you pump out an answer that describes that? – Mark Mar 01 '20 at 09:16

2 Answers2

0

Assuming that the number incidents $N_i$ occuring on particular days are independent, the probability in question is $$P(\sum_{i=1}^y N_i >x)=\sum_{k=x+1}^{4y} P(N_i =k). $$ For a given $k>x$ we have to find partitions of $k$ as $k= \sum_{j=0}^4 ja_j$ where $a_j$ denotes the number of days when $j$ incidents occured. For a given partition $(a_0,...,a_4)$ the probability is $\frac{y!}{a_0!a_1!...a_4!}p_0^{a_0}p_1^{a_1} ... p_4^{a_4}$. So $$P(\sum_{i=1}^y N_i >x)= \sum_{k=x+1}^{4y} \sum_{\sum_{j=0}^4 ja_j=k}\frac{y!}{a_0!a_1!...a_4!}p_0^{a_0}p_1^{a_1} ... p_4^{a_4} $$ For general $x$ and $y$ this is not an easy task.

Mick
  • 2,265
  • if we were to limit the question to "more than 20 incidents over a period of 30 days" how does the formula fold down? I am assuming it would be easier to look at a specific example. Being a non-maths person I'm not sure how you get to the RHS you have stated at the top, considering there are only 5 possible probabilities. Not saying you're wrong - at all - this is just me banging up against the limits of my understanding (and not having a PhD!) – Mark Mar 01 '20 at 09:15
  • ok just reading up on partitions.... Is there a way of assuming anything about the partitions, for instance where there are 20 incidents over 30 days (I assume) the greatest probability would be a partition based on $P(0)$ and $P(1)$. Is it ok to think about it in this way? – Mark Mar 01 '20 at 09:21
  • The maximal number of incidents through $y$ days is $4y$, since each day maximally 4 incidents may occur. More than $x$ incidents mean that either $x+1$ or $x+2$ or ... $4y$ incidents occured. These are all disjoint events. It's not the number of incidents per day that make this hard, but the possibilities to get $k$ incidents through $y$ days. Take $ y=2$. Then to get $6$ incidents in total is either have $2$ incidents on the first day and $4$ on the second, or $3$ each days etc... does this help? – Mick Mar 01 '20 at 09:22
  • Yes, I think this is what you're referring to when you are talking about partitions. It might be easier to see how it falls out when just assuming a simple partition based on $P(0)$ and $P(1)$ i.e. $10.P(0) + 20.P(1)$ – Mark Mar 01 '20 at 09:24
  • If $y$ is not much greater than $x$ then mostly you will have 0's and 1's. If $y=30$ and $x=20$ and $k=21$ then a good partition would be $21= 5*4 +1$, i.e. five days there were 4 incidents and 1 day with 1 incident. – Mick Mar 01 '20 at 09:26
  • ... but a very low probability of this scenario occuring I assume... – Mark Mar 01 '20 at 09:28
  • Well each "elementary" event has a low probability, but adding them up may result in a significant probability. By elementary event I mean a fixed partition. E.g. in my previous comment, the probability that there are 5 days with four incidents and 1 day with 1 indicent is $30!/(5!1!24!) p_0^{24}p_1p_4^{5}$. – Mick Mar 01 '20 at 11:20
0

Let's say $X_i$ is the number of incidents on day $i$, and $Y$ is the total number of incidents in $30$ days, so $Y=\sum_{i=1}^{30} X_i$. We assume that the $X_i$'s are independent and all have the stated distribution.

By calculation from the probabilities provided, the mean of $X_i$ is $\mu_X=0.65$, and the variance is $\sigma^2_X =0.7075$. So the mean of $Y$ is $\mu_Y = 30 \mu_X = 19.5$, and the variance is $\sigma^2_Y = 30 \sigma^2_X = 21.225$.

It seems reasonable to approximate $Y$ with a Normal distribution with mean $\mu_Y$ and variance $\sigma^2_Y$. Then $Z= (Y-\mu_y)/\sigma_Y$ has a Normal(0,1) distribution, and we can find using tables of the normal distribution or by use of software that $P(Y < 20.5) \approx 0.586$, so $P(Y > 20.5) = 1 - P(Y < 20.5) \approx \boxed{0.414}$.

It is also possible to find an exact probability that $Y > 20$ using a probability generating function and a computer algebra system, such as Mathematica. It turns out that the exact probability is $0.400276$ (to six digits), so the Normal approximation is pretty good. For those interested in such things, the probability generating function of $Y$ is $$f(x) = \left(0.55\, +0.28 x+0.15 x^2+0.01 x^3+0.01 x^4\right)^{30}$$ The probability that $Y<=20$ is the sum of the coefficients of $x^n$ for $0 \le n \le 20$ when $f(x)$ is expanded.

If you are interested in learning about generating functions in general (including probability generating functions), a number of resources can be found in the answers to this question: How Can I Learn About Generating Functions?

awkward
  • 14,736