Does this violate the central limit theorem?

Question

Recently I was trying to work out a random generator for a simulation. I wanted to generate multiple "scores", uniformly distributed over some range like 0 to 1, but I also wanted certain subgroups of these scores to be correlated with each other. Rather than thinking about it too hard, I just played with things to find a technique that might work. Here's the algorithm:

For each subgroup $i$, draw a predisposition $x_i = \tan(\theta_i)$ where $\theta_i$ is a uniform random variate in the range ($-\pi/2$, $\pi/2$).

For each element $j$ in subgroup $i$, draw a specific variation $y_{j} = \tan(\phi_j)$ the same way as above. $\phi$ is a uniform random variate in the range ($-\pi/2$, $\pi/2$).

Each "score" $z_{i, j} = \tan^{-1}(c x_i + (1-c) y_j) / \pi + 1/2$. $c$ is a kind of correlation coefficient between 0 and 1. I'm not saying it's actually the correlation coefficient between elements in group $i$, but when c is 0 there's no correlation and when it's 1 there's 100% correlation.

Here's the problem: When I try this and make histograms of $z$, it looks perfectly uniform. It looks like this method works! But it can't really, can it? It's just wrong. This implies that the average of multiple variates $\frac{1}{N} \sum_i^N x_i$ does not converge to zero as $N \rightarrow \infty$ which seems like nonsense to me. It also would imply that this distribution does not follow the central limit theorem.

The proper algorithm is to draw predispositions and specific variations, $x_i$ and $y_j$, from a normal distribution. Then you can take a weighted sum, divide by $\sqrt{c^2 + (1-c)^2}$ and use the error function to get back a uniform distribution. After thinking harder on this, that's the only approach that seems correct to me. So ... why does the tan/arctan approach appear to work so well? Is it actually correct somehow, or does it just work absurdly well as an approximation?

This is a little cheesy, but here's a quick numerical simulation in Python:

import matplotlib.pyplot as plt
from math import tan, atan, pi
import numpy as np
import random
zlist = np.zeros((10_000_000,), dtype=float)
for i in range(len(zlist)):
    x = tan(random.uniform(-pi/2, pi/2))
    y = tan(random.uniform(-pi/2, pi/2))
    zlist[i] = atan((x + y) / 2)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,6))
(counts, xbins, _) = axes[0].hist(zlist, bins=100, range=(-pi/2, pi/2))
axes[1].plot(xbins[:len(counts)], counts)

grand_chat · Accepted Answer · 2023-03-07T00:28:43.783

Each of your score variables $Z$ truly is uniformly distributed over $(-\pi/2, \pi/2)$. The reason is that (1) both $X$ and $Y$ have the Cauchy distribution, and (2) the convex combination $cX + (1-c)Y$ also has the same Cauchy distribution (provided $X$ and $Y$ are independent, which is implicit in your algorithm). Taking the arctangent of this combination then brings you back to the original uniform distribution.

The remarkable behavior embodied in (2) seems to contradict the central limit theorem, because by induction it follows that the average of $n$ independent Cauchy random variables again has the same Cauchy distribution, and won't converge to a normal distribution. However, the Cauchy distribution does not have a finite mean, even though it's symmetric about zero, so the CLT doesn't apply.

Proof of (1): The cumulative distribution function for $X$ is $$F_X(x):=P(X\le x)=P(\tan\Theta\le x)=P(\Theta\le \tan^{-1}( x))=\frac{\tan^{-1}(x)+\frac\pi2}\pi$$ so the density of $X$ is the derivative $$f_X(t)=\frac{dF_X}{dx}=\frac1\pi\frac1{1+x^2},$$ which is the density of the (standard) Cauchy distribution.

Proof of (2): The characteristic function of the Cauchy distribution is $E(e^{itX})=\int_{-\infty}^\infty e^{itx}f_X(x)\,dx = e^{-|t|}$, so if $X$ and $Y$ are independent Cauchy then the characteristic function of the convex combination $cX+(1-c)Y$, for $c\in[0,1]$, is $$E\left(e^{it[cX+(1-c)Y]}\right)=Ee^{itcX}Ee^{it(1-c)Y}=e^{-|ct|}e^{-|(1-c)t|}=e^{-(c+1-c)|t|} $$ which is again the characteristic function of the Cauchy distribution.

Thank you for the excellent explanation! I either didn't know about the Cauchy distribution or had forgotten all about it. Fascinating that it does not have a defined mean! — user149485, Mar 07 '23 at 00:30

Does this violate the central limit theorem?

1 Answers1