Probability that the mean of a sequence of random variables never exceeds a critical value

Question

Let $p,\mu_c \in [0,1]$. Let $(X_i)_{i \in \mathbb{N}}$ be a sequence of i.i.d random variables, each following a Bernoulli distribution $\mathcal{B}(p)$. For $n \in \mathbb{N}$, let $ \mu_n = \frac{1}{n+1} \sum_{i=0}^n X_i $ and $A_n$ the event: "$\mu_n \leq \mu_c$". I am looking for a way to compute $$ \mathbb{P} \left(\bigcap_{n \in \mathbb{N}} A_n \right), $$ that is, the probability that the mean of the sequence never exceeds the critical value $\mu_c$.

You can assume $\mu_c > p$, since this probability is $0$ when $\mu_c \leq p$.

I first wanted to compute it using a martingale, but the optional stopping theorem does not seem to help. I don't know if computing it "by hand" (combinatorially) is possible, but it would be quite unsatisfying anyway. Eventually, it may be somehow linked to this post, but I don't understand Brownian motion well enough to use it correctly.

Thank you for any advice!

Edit: When $p = \frac{1}{2}$ and $\mu_c = \frac{3}{4}$, simulations give $$\mathbb{P} \left(\bigcap_{n \leq 1000} A_n \right) \approx 0.456.$$ I think that this is a very good approximation of $\mathbb{P} \left(\bigcap_{n \in \mathbb{N}} A_n \right)$ (going up to 2000 does not affect the result). This fact may be used to check any suggested answer. I find it surprisingly high, since $\mathbb{P}(A_0)$ alone is equal to $\frac{1}{2}$.

Very interesting question! Just a quick observation: I think the answer is not gonna be continuous in $\mu_c$. E.g. if $\mu_c = 2/3$ then the sequence $(0, 1, 1, 0, \dots)$ passes the test (so far), but if $\mu_c = 2/3 - \epsilon$ then the sequence $(0, 1, 1, 0, \dots)$ fails the test immediately. And given the law of large numbers, any violation is likely to be at the beginning. — antkam, May 28 '20 at 15:49
Thanks. I was aware of this fact, but it's worth pointing it out. In fact, writing $\mu_n < \mu_c$ instead of $\mu_n \leq \mu_c$ in the code gives a different numerical answer. I am afraid it decreases our chances to find an elegant closed-form solution... — Argemione, May 28 '20 at 16:09
Oh wow, nowhere continuous... That seems intuitively plausible and (if true) totally awesome! A nowhere continuous function that is also monotonic! — antkam, May 28 '20 at 16:58
On second thought, it seems more plausible the function is discontinuous at rationals (and only at rationals)... — antkam, May 28 '20 at 17:00
It is an increasing function, so it can only be discontinuous at a countable set of points, and yes, I think this set is rationals. I'm trying to prove it. Following your first comment, given $\frac{a}{b}$, I'm considering the sequences starting with $(0,...,0,1,...,1,...)$ with $(b-a)$ 0s and $a$ 1s. — Argemione, May 28 '20 at 17:03
Very interesting question! Could it be easier to find the probability of the complementary event? — P3rs3rk3r, May 28 '20 at 17:27
Applying DeMorgans Law, you would need to find the following probability: $$\mathbb{P} \left(\bigcup_{n \in \mathbb{N}} A^C_n \right)$$ — P3rs3rk3r, May 28 '20 at 17:30
@Argemione >"It is an increasing function, so it can only be discontinuous at a countable set of points" -- no wonder I cant think of any textbook examples of nowhere continuous functions which are also monotonic. :) BTW this stmt is proved here — antkam, May 28 '20 at 18:30
This Python code calculates the required probability (not exactly). You can decrease eps to get more accurate.
def g(p, mu_c, comp = 0, mult = 1, eps = 0.0000001):

$\ \ \ \ $if comp > 0: return 0

$\ \ \ \ $elif mult < eps: return mult

$\ \ \ \ $else: return g(p, mu_c, comp+1-mu_c, pmult, eps) + g(p, mu_c, comp-mu_c, (1-p)mult, eps) — Varun Vejalla, May 28 '20 at 20:06
@VVejalla - Do you mind posting that as an Answer and also explaining the logic? — antkam, May 28 '20 at 20:36

Varun Vejalla · Accepted Answer · 2020-05-30T01:07:56.493

Let $f_{p, \mu_c}(s, n)$ be the probability of never exceeding $\mu_c$ given a sum of $s$ after $n$ iterations. The answer would then be $f_{p, \mu_c}(0, 0)$. The average will be $\frac{s}{n}$, so clearly $$f_{p, \mu_c}(s, n)=0$$ if $s/n > \mu_c \to s-\mu_cn > 0$

If this condition is not satisfied, then $$f_{p, \mu_c}(s, n) = pf_{p, \mu_c}(s+1, n+1) + (1-p)f_{p, \mu_c}(s, n+1)$$

since there is a $p$ probability that $s$ will increase $1$ and a $1-p$ probability that $s$ will stay the same.

This can instead be rewritten with one variable instead of two in the argument. Let $c = s-\mu_cn$. Then $$P_{p, \mu_c}(c) = pP_{p, \mu_c}(c+1-\mu_c) + (1-p)P_{p, \mu_c}(c-\mu_c)$$

This Python code implements this in order to calculate the probability. You can decrease eps in order to make it more accurate. With eps = 1e-8, $P(1/2, 3/4) = 0.4565$ is obtained.

def g(p, mu_c, comp = 0, mult = 1, eps = 1e-6):
    #mult is indicator variable (if mult drops below eps, then the function simply returns mult)
    if comp > 0:
        return 0
    elif mult < eps:
        return mult
    else:
        return g(p, mu_c, comp+1-mu_c, p*mult, eps) + g(p, mu_c, comp-mu_c, (1-p)*mult, eps)

Edit: I just thought of another approach. Let $L$ be a sequence of length $n$ of $0$s and $1$s. Let $$m(L) = \max_{k=1...n} \frac{1}{k} \sum_{i=1}^k L_i$$ be the maximum average value of the first $k$ elements. For example $m(\{0, 1, 0, 0, 1\}) = \frac{1}{2}$ choosing $k = 2$.

The answer is then $$\lim_{n \to \infty} \sum_{L_n, m(L) \le \mu_c} p^{\sum L} (1-p)^{n-\sum L}$$ where the sum is over all $L$ of length $n$ such that $m(L) \le \mu_c$ and $\sum L$ is the number of $1$s in $L$.

When $p = 1/2$, this is easier to compute, as it can be simplified to $$\lim_{n \to \infty} (1/2)^{n} \underbrace{\sum_{L_n \cap m(L) \le \mu_c}1}_{S_n}$$

With $\mu_c = 3/4$, as in the example in the question, it seems that $S_n = 2S_{n-1}$ unless $n \equiv 1 \pmod 4$, in which case $S_n = 2S_{n-1} - \frac{\binom{4n}{n}}{3n+1}$. Using this, I get that $$P_{1/2, 3/4} = 1 - \frac{1}{2} \ _3F_2(\frac{1}{4}, \frac{1}{2}, \frac{3}{4}; \frac{2}{3}, \frac{4}{3}; \frac{16}{27}) \approx 0.456$$ The hypergeometric function is equal to A086253. However, I am not $100\%$ sure about the relation between $S_n$, so this is just an educated guess for now. In a similar way, I guess that $$P_{1/2, 2/3} = \frac{-1 + \sqrt{5}}{4}$$

Edit 2: Let $M$ be the sorted list of the locations of the $1$s in $L$. Let $$h(M) = \max_{i=1...k} \frac{i}{M_i}$$ Then this is equivalent to $$\lim_{n \to \infty} \sum_{k=0}^{n} p^{k} (1-p)^{n-k} \sum_{M_{k, n}\cap h(M) \le \mu_c} 1$$ where $M_{k, n}$ is all $\binom{n}{k}$ combinations from $1...n$ of length $k$. For example, $M_{2, 3} = \{\{1, 2 \}, \{1, 3\}, \{2, 3\}\}$. Finding the distribution of $h(M)$ for all $M$ in $M_{k, n}$ is now sufficient for the answer. The Python code for this is included.

from itertools import combinations

def h(M):
    if len(M) == 0:
        return 0
    return max(i/M[i-1] for i in range(1, len(M)+1))

def findProb(p, mu_c, n=10):
    prob = 0
    for k in range(n+1):
        t = sum(1 for M in combinations(range(1, n+1), k) if h(M) <= mu_c)
        prob += p**k*(1-p)**(n-k) * t

    return prob

Nice answer, even though I am only half satisfied by the recursive definition. Not sure if we can hope for more. I'll wait a few days and accept it. — Argemione, May 28 '20 at 22:47
+1 Wow, this is a very interesting way to transform the recursion! Can you shed some light on how it works? AFAICT, the main idea / invariant is: $\text{g(p, mu_c, comp, mult)} = mult \times P_{p, \mu_c}(comp)$... am I right? But if so, how can you just return $mult$ when $mult < eps$? Wouldn't that be equivalent to assuming the corresponding $P_{p, \mu_c}(comp) =1$? And yet the code clearly works numerically. I am very confused! Maybe I'm missing something very obvious? :( — antkam, May 29 '20 at 21:16
@antkam "$\text{g(p, mu_c, comp, mult)} = mult \times P_{p, \mu_c}(comp)$" This is correct. When returning $mult$ when $mult < eps$, it would be equivalent to assuming $P_{p, \mu_c} (comp) = 1$. The logic is that it has already picked so many previous numbers. There is then a very small probability for the average to change enough to ever exceed $\mu_c$. — Varun Vejalla, May 29 '20 at 23:01
I now think it works because of the following: While it isn't true that $P(comp) = 1$ for $comp < 0$, it is true that $P(comp) = 1$ for $comp \ll 0$! And in your big binary tree, most of the leaf-nodes (i.e. $mult < eps$) will be s.t. $comp \ll 0$, so while the approximation $P(comp) = 1$ is invalid for a small fraction of leaf-nodes, namely those where $comp <\approx 0$, it is valid for a much larger fraction of leaf-nodes. I think that's why it works. In any case, this is very neat! — antkam, May 30 '20 at 00:51
@antkam Yeah that makes a lot more sense than my explanation. I think if there was a gradient for the return value based on $comp$, it would be even more accurate. However, this might be more trouble than it is worth. — Varun Vejalla, May 30 '20 at 01:04
Gradient! Haha. Well in fact I raised this question because my first thought was: heck if returning $mult$ is valid, then returning $mult/2$ should be equally valid, right? I.e. if you take the know-nothing stance that we dont know $P(comp)$ so might as well assume it is $1/2$. But of course, returning $mult/2$ would result in exactly half the summed value! :D And the numerical evidence is strongly on your side. Hence my question originally... Anyway, very neat! — antkam, May 30 '20 at 01:11

Probability that the mean of a sequence of random variables never exceeds a critical value

1 Answers1