10

I am looking for a formula that evaluates the mean distance from origin after $N$ equal steps of Random-Walk in a $d$-dimensional space. Such a formula was given by "Henry" to a question by "Diego" (q/103170)

$$\sqrt{\dfrac{2N}{d}} \dfrac{\Gamma(\frac{d+1}{2})}{\Gamma(\frac{d}{2})}$$

I will be very gratefull if you can give me reference to an article that show how this formula was derived. Thanks!

Picard Porath
  • 169
  • 2
  • 8

2 Answers2

8

The formula is not exact, but asympotically. Informally: let $z_i = x_i - y_i$ be the $i$-th coordinate after $N$ steps, with $x_i$ ($y_i$) be the number of steps in positive (negative) direction. When $N$ is large, $\{x_i,y_i\}$ tend to iid Poisson variables, with $\lambda=E(x_i) = \frac{N}{2 d} = Var(x_i)$. Applying the CLT, $z_i$ approaches a normal distribution with zero mean and variance $Var(x_i)+Var(y_i)=\frac{N}{d}$.

We are interested in $E(\sqrt{z_1^2 + \cdots z_d^2})$. But the square root of a sum of $d$ normals $N(0,\sigma^2)$ follows a Chi distribution, with mean $\sqrt{2 \sigma^2} \dfrac{\Gamma(\frac{d+1}{2})}{\Gamma(\frac{d}{2})}$ From this, you get the desired formula.

leonbloy
  • 63,430
  • Perhaps I am missing something, but where did the Poisson variables come in? It seems to me that it is immediate from the CLT that $z_i$ is approximately normal, since it is the sum of $N$ iid random variables (with values $\pm 1$). Also, I think $x_i, y_i$ would be approximately normal, not Poisson (the distributions of the summands are not changing with $N$) and not independent of each other (since they must sum to $N$). Otherwise, I agree with the conclusion. – Nate Eldredge Apr 11 '12 at 02:27
  • @NateEldredge: the CLT is immediate only in 1D, in more dimensions $z_i$ is not the sum of $N$ variables but of $n_i$, which is itself a random variable (with $\sum n_i = N$), hence the CLT is not so clear here. Instead, it's clear that $x_1,y_1,x_2,y_2...$ is identical to a urns-and-balls ($2d$ urns, $N$ balls) model, which is equivalent ("Poissonization") to $2d$ iid Poisson variables conditioned to the value of their sum being $N$ (asymptotically, this conditioning turns irrelevant). – leonbloy Apr 11 '12 at 13:54
  • Oh right. I was confused and thinking of something else. Thanks for the clarification. – Nate Eldredge Apr 11 '12 at 14:00
  • @leonbloy:Can you give any reference to a publication where this formula was mentioned? Thanks – Picard Porath Sep 07 '13 at 13:59
  • Just to add, I'd be very grateful for a textbook or paper reference for this. Finding it very useful, but hard to credit! – DRG Jul 08 '15 at 13:52
  • @leonbloy I read your exchange with Nate and wanted to learn more about Poissonization.

    To be clear - the variables ${x_i, y_i}$ obey a multinomial distribution, not Poisson, right? And if we approximate $n_i$ with $Pois(\lambda) = Pois(N/2d)$ aren't we implicitly assuming that the mean of $n_i$, $N/2d$, approaches a constant value as $N \to \infty$? This fails, and your solution is correct (and intuitive), so I'm wondering how to make this rigorous.

    – Titus Sep 23 '15 at 11:29
1

Let $\vec{R}$ be the end-to-end distance vector of a random walk of fixed step length $|\vec{r}_i| = l$. $\vec{R}$ can then be expressed as $\displaystyle \vec{R} = \sum_{i=1}^N \vec{r}_i$, where $\vec{r}_i$ is the vector of the $i$-th step. The Root-Mean-Square End-to-End Distance is given by $\textrm{RMS}=\sqrt { \langle R^2 \rangle }$. Since the steps are mutually independent, the covariance of two steps $\vec{r}_i$ and $\vec{r}_j$ is zero if $i\neq j$ and $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j)= \textrm{Var}(\vec{r}_i)$ if $i=j$. The variance of $ \vec{r}_i$ can be expressed as $ \textrm{Var}(\vec{r}_i)= \langle \vec{r}_i \cdot \vec{r}_i \rangle - \langle \vec{r}_i \rangle^2$. Due to symmetry $\langle \vec{r}_i \rangle=\vec{0}$ and therefore the variance of of $ \vec{r}_i$ is simply $ \textrm{Var}(\vec{r}_i)= \langle \vec{r}_i \cdot \vec{r}_i \rangle = |\vec{r}_i|^2 = l^2$. Altogether, the covariance of $\vec{r}_i$ and $\vec{r}_j$ equals $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j)=\delta_{ij}l^2$. The covariance of $\vec{r}_i$ and $\vec{r}_j$ can also be expressed as $\textrm{Cov}(\vec{r}_i, \ \vec{r}_j) = \langle \vec{r}_i \cdot \vec{r}_j \rangle - \langle \vec{r}_i \rangle \cdot \langle \vec{r}_j \rangle$. Combining the two different expressions for the covariance and using that $\langle \vec{r}_i \rangle=0$, results in $\langle \vec{r}_i \cdot \vec{r}_j \rangle =\delta_{ij}l^2$. This result can be used to determine the RMS:

$$\textrm{RMS}=\sqrt { \langle R^2 \rangle } = \sqrt { \langle \vec{R} \cdot \vec{R} \rangle } =\sqrt { \big\langle \sum_{i=1}^N \vec{r}_i \cdot \sum_{j=1}^N \vec{r}_j \big\rangle } =\sqrt { \sum_{i=1}^N \sum_{j=1}^N \langle \vec{r}_i \cdot \vec{r}_j \rangle } =\sqrt { \sum_{i=1}^N \sum_{j=1}^N l^2 \delta_{ij} + 0^2}= $$ $$=\sqrt { \sum_{i=1}^N l^2}=\sqrt { N l^2}=l\sqrt { N }$$

Let $Z_i$ denote the $i$-th coordinate of the end-to-end distance vector $\vec{R}$ after $N$ steps, and let $X_i$ and $Y_i$ denote the number of steps taken in the $i$-th dimension in the positive and negative direction respectively. Then the set of random variables $\{X_i, Y_i\}_{i=1}^d$ follows a multinomial distribution with parameters $N$ and $\displaystyle p_i=\frac{N}{2d}$. For sufficiently large values of $N$, $\{X_i, Y_i\}_{i=1}^d$ are approximately iid (independent and identically distributed) Poisson random variables with parameters $\displaystyle \lambda_i = \frac{N}{2d}$. For $\lambda > 20$, i.e. $N>40d$, $\textrm{Po}(\lambda) \sim \textrm{N}(\lambda, \lambda)$. $ Z_i = l(X_i - Y_i)$ and therefore $\displaystyle Z_i \sim \textrm{N}(l(\lambda - \lambda), l^2(\lambda+\lambda))=\textrm{N}(0, 2l\lambda)=\textrm{N}\left(0, \frac{l^2N}{d}\right)$.

$\displaystyle \langle R \rangle = \langle \sqrt{R^2} \rangle = \left\langle \sqrt{ \sum_{i=1}^d Z_i^2} \right\rangle$. The square root of a sum of $k$ independent $\textrm{N}(0, 1)$-distributed random variables is distributed according to the chi distribution, $\chi_k$. Therefore $\displaystyle \sqrt{ \sum_{i=1}^d \frac{dZ_i^2}{l^2N}}$ is approximately $\chi_d$-distributed for large values of $N$. The expected value of a $\chi_k$-distributed random variable is $\displaystyle \sqrt{2} \frac{ \Gamma \left(\frac{k+1}{2}\right) }{\Gamma \left( \frac{k}{2}\right)}$.

Hence $\displaystyle \langle R \rangle =\left\langle\sqrt{ \sum_{i=1}^d Z_i^2}\right\rangle =\left\langle l \sqrt{\frac{N}{d}} \sqrt{ \sum_{i=1}^d \frac{dZ_i^2}{l^2N} }\right\rangle = l \sqrt{\frac{2N}{d} }\frac{ \Gamma \left(\frac{d+1}{2}\right) }{\Gamma \left( \frac{d}{2}\right)}$.

Filip
  • 465