Recently I was trying to work out a random generator for a simulation. I wanted to generate multiple "scores", uniformly distributed over some range like 0 to 1, but I also wanted certain subgroups of these scores to be correlated with each other. Rather than thinking about it too hard, I just played with things to find a technique that might work. Here's the algorithm:
For each subgroup $i$, draw a predisposition $x_i = \tan(\theta_i)$ where $\theta_i$ is a uniform random variate in the range ($-\pi/2$, $\pi/2$).
For each element $j$ in subgroup $i$, draw a specific variation $y_{j} = \tan(\phi_j)$ the same way as above. $\phi$ is a uniform random variate in the range ($-\pi/2$, $\pi/2$).
Each "score" $z_{i, j} = \tan^{-1}(c x_i + (1-c) y_j) / \pi + 1/2$. $c$ is a kind of correlation coefficient between 0 and 1. I'm not saying it's actually the correlation coefficient between elements in group $i$, but when c is 0 there's no correlation and when it's 1 there's 100% correlation.
Here's the problem: When I try this and make histograms of $z$, it looks perfectly uniform. It looks like this method works! But it can't really, can it? It's just wrong. This implies that the average of multiple variates $\frac{1}{N} \sum_i^N x_i$ does not converge to zero as $N \rightarrow \infty$ which seems like nonsense to me. It also would imply that this distribution does not follow the central limit theorem.
The proper algorithm is to draw predispositions and specific variations, $x_i$ and $y_j$, from a normal distribution. Then you can take a weighted sum, divide by $\sqrt{c^2 + (1-c)^2}$ and use the error function to get back a uniform distribution. After thinking harder on this, that's the only approach that seems correct to me. So ... why does the tan/arctan approach appear to work so well? Is it actually correct somehow, or does it just work absurdly well as an approximation?
This is a little cheesy, but here's a quick numerical simulation in Python:
import matplotlib.pyplot as plt
from math import tan, atan, pi
import numpy as np
import random
zlist = np.zeros((10_000_000,), dtype=float)
for i in range(len(zlist)):
x = tan(random.uniform(-pi/2, pi/2))
y = tan(random.uniform(-pi/2, pi/2))
zlist[i] = atan((x + y) / 2)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(12,6))
(counts, xbins, _) = axes[0].hist(zlist, bins=100, range=(-pi/2, pi/2))
axes[1].plot(xbins[:len(counts)], counts)