Why does sum mod k of uniform distributions converge to uniform?

Question

I'm repeating an action every ten minutes; it occurs the $10n$'th minute of every hour. If instead I repeat it every $\{10, 11\}$ minutes, chosen 50/50, every minute of the hour becomes equally likely in the limit. Why?

Formally: let $k$ be given, and let $X_i$ be uniformly distributed on a subset of $\{0, \ldots, k-1\}$ such that its support contains two values $d_a$ and $d_b$ with $gcd(d_b - d_a, k) = 1$. Let $Y_j = (\sum_{i=1}^j X_i)\ \textrm{mod}\ k$. Then I conjecture that the distributions of $Y_n$ converge to a uniform distribution on $\{0, \ldots, k-1\}$ as $n$ goes to infinity. Is this true? Why?

Also, can the uniformity requirement on $X_i$ be relaxed? What does convergence of a function (e.g. the probability density function of $Y_j$) even mean? I guess since it's non-zero on only finitely many points, a pointwise convergence is what I want? My own proof ideas focus on the CLT, and maybe since its density function is (uniformly?) continuous, if I increase $n$ enough I increase the spread, so the difference of probability mass across $k$ consecutive mod-buckets can be bounded arbitrarily close to 0. Is this a fruitful path?

Related question shows conditions where the fractional part of a sum of i.i.d variables converges to uniform in distribution: https://math.stackexchange.com/questions/4157329/if-x-is-gaussian-prove-that-x-lfloor-x-rfloor-sim-u0-1-as-its-variance — Peter O., Aug 20 '21 at 18:37

Sangchul Lee · Answer 1 · 2021-08-20T22:48:33.100

Denote the Fourier transform of $X_1$ by

$$ \varphi(j) = \mathbb{E}[\exp\{2\pi i j X_1 / k\}], $$

and regard $\varphi$ as a function on $\mathbb{Z}/k\mathbb{Z}$. Since the points on $\partial\mathbb{D}$ are extreme points of the convex set $\overline{\mathbb{D}}$, we know that $|\varphi(j)| = 1$ if and only if $\exp\{2\pi i j X_1 /k\}$ is constant a.s. This amounts to telling that

$$ x \equiv x' \pmod{k/\gcd(k,j)} $$

whenever $x$ and $x'$ lie in the support of $X_1$. Now under the OP's assumption, this can occur only for $j = 0$. Consequently, $|\varphi(j)| < 1$ for $ j \neq 0$ and hence

$$ \mathbb{E}[\exp\{2\pi i j Y_n / k\}] = \varphi(j)^n \to \mathbb{1}_{\{j = 0\}}. $$

Since $\mathbb{1}_{\{j = 0\}}$ is the Fourier transform of the uniform distribution over $\mathbb{Z}/k\mathbb{Z}$, it follows that $Y_n$ converges in distribution to the uniform distribution over $\mathbb{Z}/k\mathbb{Z}$.

Mittens · Answer 2 · 2021-08-21T09:31:08.660

If $(X_n:n\in\mathbb{N})$ is an i.i.d sequence of random variables supported in $\{0,\ldots,k-1\}$. If $S_{k, n}=\sum^n_{j=1}X_j\mod k$, then $\frac{1}{k}S_{k, n}\mod 1$ is supported on $\{j/k: 0\leq j< k-1\}$. Hence, it is enough to consider discrete measures with fine support over the circle $\mathbb{S}^1=\mathbb{R}/{2\pi \mathbb{Z}}$.

The Fourier transform of the uniform distribution $\mu(j/k)=\frac{1}{k}$ for $0\leq j<k$ is $$\hat{\mu}(m)=\sum^{k-1}_{j=0}\frac{1}{k}e^{-2\pi imj/k}=\frac{1-e^{-2\pi m}}{1-e^{-2\pi im/k}}=\mathbb{1}_{k\mathbb{Z}}(m)$$

Let $\{X_n\}$ be an i.i.d sequence supported in $\{0,1/k,\ldots,(k-1)/k\}$ with distribution $\nu$ such that there are $j_1<j_2$ with $g.c.d(j_2-j_1,k)=1$ such that $\nu(j_1/k)\nu(j_2/k)>0$. Let $S_n=\sum^n_{j=1}X_j$ $$ \hat{\nu}_{S_n}(m)=\big(\hat{\nu}(m)\big)^n=\Big(\int e^{-2\pi imx}\,\nu(dx)\Big)^n=\Big(\sum^{k-1}_{j=1}\nu(j/k)e^{-2\pi ijm/k}\Big)^n $$ Notice that if $m\in k\mathbb{Z}$, $\hat{\nu}(m)=1$. Conversely, suppose $m$ is such that $|\hat{\nu}(m)|=\Big|\int^1_0 e^{-2\pi i xm}\,\nu(dx)\Big|=1$, and let $\theta_m\in(0,1)$ such that such that $e^{2\pi i\theta}\hat{\nu}(m)=1$. Then $$1=e^{2\pi i\theta}\int^1_0e^{-2\pi mx}\,\nu(dx)=\int^1_0\cos(2\pi(mx-\theta))\,\nu(dx)$$ this means that $2\pi(mx-\theta)\in 2\pi\mathbb{Z}$, i.e. $\operatorname{supp}(\nu)\subset\frac{\theta}{m}+\frac{1}{m}\mathbb{Z}$. Since $\operatorname{supp}(\nu)\subset\{j/k: 0\leq j<k\}$ it follows that for any two points $\frac{j_1}{k},\frac{j_2}{k}$ in $\operatorname{supp}(\nu)$, $j_1<j_2$, there is an integer $p$ such that $$j_2-j_1=\frac{kp}{m}<k$$ By assumption, there is a pair $j_1/k$ and $j_2/k$ in $\operatorname{supp}(\nu)$ with $0\leq j_1<j_2<k$ such that $$1=\ell_1(j_2-j_1)+\ell_2 k$$ for some $\ell_1, \ell_2\in\mathbb{Z}$. Hence $$m=\ell_1m(j_2-j_2)+\ell_2mk=k(\ell_1p+m\ell_2),$$ that is, $m=p'k$ for some $p'\in\mathbb{N}$. This shows that $\hat{\nu}(m)<1$ unless $m\in k\mathbb{Z}$. As a consequence $$\lim_n\hat{\nu}_{S_n}(m)=\lim_n\big(\hat{\nu}(m)\big)^n=\mathbb{1}_{k\mathbb{Z}}(m)$$ This means that $\nu_{S_n}$ converges weakly to the uniform distribution over $\{0,\ldots,k-1\}$.

score 0 · Answer 3 · answered Aug 20 '21 at 18:38

Let us start with the easiest case, where $P(X_i = 0) = P(X_i = 1) = 1/2.$ Define $Z_j = \sum_{i=1}^k X_i$. In this case $Z_j$ is binomial distributed with $j$ trials and success probability $1/2$. Assume for simplicity that $k$ divides $j+1$. Therefore, for all $0\leq a<k$ $$ P(Y_j=a) = \sum_{i=0}^{ j/k} P(Z_j = ki + a) = 2^{-j} \sum_{i=0}^{ j/k} \binom{j}{ki+a}. $$ Now, using $$ \sum_{i=0}^{j/k} \binom{j}{ki+a}= \frac{1}{k}\sum_{i=0}^{k-1}\omega^{-i\,a} \big(1+\omega^i\big)^{j},$$ where $\omega$ is the $k$-th (complex) root of unity (this is proven for example in lacunary sum of binomial coefficients), we obtain $$ P(Y_j=a) = \sum_{i=0}^{ j/k} P(Z_j = ki + a) = \frac{1}{k}\sum_{i=0}^{k-1}\omega^{-i\,a} \frac{\big(1+\omega^i\big)^{j}}{2^j}. $$ In the latter sum, all terms with $i\neq0$ converge to zero (as the base in the numerator has an absolute value of strictly less than 2) and thus $$ P(Y_j=a) \to\frac{1}{k}, $$ for $j\to\infty$ as desired.

I am unsure however, whether such a combinatorial argument can easily be extended to arbitrary subsets. I believe the CLT might be helpful, but I don't know, how to apply it exactly.

Regarding your question about the meaning of convergence of random variables - there are several definitions, which are presented on, e.g., https://en.wikipedia.org/wiki/Convergence_of_random_variables. In your case, we speak of convergence in distribution, i.e., the probability masses converge to $P(Y_j=a) \to 1/k$ for all $0\leq a<k$, when $j \to \infty$. There are stronger forms of convergence, which would need to be proven separately.

Jonas Kölker · Answer 4 · 2021-08-22T08:15:22.020

Let $X_n$ be independent and identically distributed with support on a subset of $\{0, \ldots, k-1\}$ for some $k$, with $d_1, d_2$ given such that $gcd(d_2 - d_1, k) = 1$ and $P(X_i = d_1) P(X_i = d_2) > 0$.

Then there exists integers $s, t$ such that $$\begin{eqnarray} s(d_2 - d_1) + tk & = & 1 \\ & = & 1 + k(d_2 - d_1) - k(d_2 - d_1) \\ & = & (s+k)(d_2 - d_1) + (t - d_2 + d_1)k \end{eqnarray} $$ Choose a positive such $s$, e.g. minimal. After $s$ transitions, if all transitions are $d_1$ then a particular modulus $m$ is reached; if all transitions are $d_2$ then $m + 1$ is reached. Thus, after $sk$ transitions every modulus is reachable: you can reach $km + \sum_{i=1}^k (0\textrm{ or }1)$.

Let $Y_n = (\sum_{i=1}^n X_i)\ \textrm{mod}\ k$. Then the $Y_n$ form a Markov chain (thanks to i.i.d). Let $\mathbf{P}$ be its transition matrix. Since every state is reachable from every other state after some fixed $m = sk$ transitions, the chain is regular (in the terminology of Snell and Grinstead, see http://pi.math.cornell.edu/~web3040/amsbook.mac-probability.pdf).

Since the chain is regular, $\mathbf{P}^n$ converges to a matrix $\mathbf{W}$ as $n \rightarrow \infty$ where all rows equal some probability vector $\mathbf{w}$ (a vector $\mathbf{v}$ is a probability vector if every entry is in $[0, 1]$ and the sum of entries is 1). Furthermore, if $\mathbf{v} = \mathbf{vP}$ then $\mathbf{v}$ is a multiple of $\mathbf{w}$ (see Snell and Grinstead).

Let $\mathbf{1}_n$ be the row vector with $n$ entries all of which are 1. Then $$ (\mathbf{1}_k \mathbf{P})_{1,c} = \sum_{d=1}^k p_{c-d, c} = \sum_{d=0}^{k-1} P(X_* = d) = 1 $$ and hence $\mathbf{w} = [\frac{1}{k} \cdots \frac{1}{k}]$, i.e. $P(Y_n = i) \rightarrow 1/k$ as $n \rightarrow \infty$ for $i \in \{0, \ldots, k-1\}$. QED.

Some generalizations: a Markov chain with matrix $\mathbf{Q}$ is reversible if every row of $\mathbf{Q}^{T}$ is a probability vector. Every reversible Markov chain has a uniform distribution over states as its steady state i.e. $(\mathbf{Q}^n)_{i,j} \rightarrow \frac{1}{k}$ as $n \rightarrow \infty$ where $k$ is the number of states.

Let $f_1, \ldots, f_n$ be permutations of a finite set of states $S$ such that $$\forall (i, j, s): i \not= j \Rightarrow f_i(s) \not= f_j(s)$$

If each transition randomly chooses an $f_i$ according to any distribution and maps every state $s$ to $f_i(s)$, then the Markov chain will be reversible. If the set of states is a group and each $f_{g_i}(g_j) = g_i \ast g_j$ then the $f_{g_i}$ form such a family. Addition modulo $k$ is a group.

For any such family $f_{g_1}, \ldots, f_{g_n}$ over an abelian cyclic group, if $g_i \ast (g_j^{-1})$ is a generator of the group for some $i$ and $j$, the Markov chain will be regular (and thus convergent). If we have $gcd(d_2 - d_1, k) = 1$ then $d_2 - d_1$ is such a generator.

Furthermore, if there exists $g_{i_1} \ast \cdots \ast g_{i_a} = e = g_{j_1} \ast \cdots \ast g_{j_b}$ with $0 < a < b$ and $gcd(a, b) = 1$, and $\{g_1, \ldots, g_n\}$ is a generator of the group, then the Markov chain is regular: take the $|G|$ products generating each element and pad them with $e$-valued products of length $a$ or $b$ until all generating sequences have the same length. Now every element can be reached from $e$ in some fixed number of steps; but then every element can be reached from every other element in a fixed number of steps. This assumes every involved element is chosen with positive probability.

Why does sum mod k of uniform distributions converge to uniform?

4 Answers4