Another puzzling identity that arose from integrating over eigenvalues of Wishart matrices.

Question

Let $n \ge 2$ and let $T > n $ be integers. We consider a sample covariance matrix, i.e. $c := {\bar C} \cdot Y \cdot Y^T \cdot {\bar C}^T \quad (1)$ where $Y $ is a $n \times T$ random matrix with i.i.d standardized Gaussian distributed entries and ${\bar C} $ is not-random $n \times n $ not-random matrix. We call the matrix $C := {\bar C} \cdot {\bar C}^T$ the underlying correlation matrix. Now the question is to compute the expected value of the logarithm of the determinant of the sample covariance matrix $E \left[ \log \det\left(c\right) \right] = ?$.

Here one can proceed in two different ways. Firstly one can use the spectral decomposition $(1)$ and then we immediately get that the result equals to the log-determinant of the underlying matrix plus a sum of logs of independent chi-squared distributed random variables with degrees of freedom $T-n+1,T-n+2,\cdots, T$ (these are the diagonal matrix elements in the lower triangular matrix in the Cholesky decomposition). The comment by Terence Tao in here explains this nicely. Having done that we immediately get the following result:

\begin{equation} E\left[ \log\det(c) \right] = \log(\det(C)) + n \log(2) + \sum\limits_{p=T-n+1}^T \psi^{(0)}(p/2) \quad (i) \end{equation} where $\psi^{(0)}$ is the polygamma function.

Another way to proceed would be to use the joint probability density of eigenvalues $f_{n,T}(\lambda_1,\cdots,\lambda_n)$ in the Wishart ensemble (see here of the Wikipedia page) and simply compute the expected value of the sum of the logs of eigenvalues. We have done this exercise and the two results turned to be exactly the same (as it should be). We present a numerical confirmation in the Mathematica code below:

In[855]:= (*Expected value of the log det C*)Clear[SS]; Clear[dSS]; \
Clear[DD];
DD[l_, i_] := If[i == n, 1, l[i]] - If[i == 1, 0, l[i - 1]];
NNangl[n_] := (Pi)^(n - 1) ( Sqrt[Pi])^Binomial[n - 1, 2]/
    Product[Gamma[(n - j)/2], {j, 0, n - 3}];
NN[T_, n_] := 
  1/n! NNangl[
    n] 1/(2^(n T/2) (Sqrt[Pi]^Binomial[n, 2] Product[
         Gamma[(T - j)/2], {j, 0, n - 1}]) );
SS[n_, T_] := 
  NIntegrate[
   Product[Abs[DD[l, xi1] - DD[l, xi2]], {xi1, 1, n}, {xi2, xi1 + 1, 
      n}] Product[(DD[l, xi1])^((T - n - 1)/2), {xi1, 1, n}], 
   Evaluate[
    Sequence @@ 
     Table[{l[xi1], If[xi1 == 1, 0, l[xi1 - 1]], 1}, {xi1, 1, 
       n - 1}]]];
dSS[j_, n_, T_] := 
  NIntegrate[
   Product[Abs[DD[l, xi1] - DD[l, xi2]], {xi1, 1, n}, {xi2, xi1 + 1, 
      n}] Log[DD[l, j]] Product[(DD[l, xi1])^((T - n - 1)/2), {xi1, 1,
       n}], Evaluate[
    Sequence @@ 
     Table[{l[xi1], If[xi1 == 1, 0, l[xi1 - 1]], 1}, {xi1, 1, 
       n - 1}]]];
n = 3; T = RandomInteger[{n + 1, 20}];
(*Appendix B page 13 in "Kullback-Leibler distance as a measure of \
the information filtered from multivariate data"*)
( n Log[2] + Sum[PolyGamma[p/2], {p, T - n + 1, T}]) // N
NN[T, n] NIntegrate[
  Sum[Log[l[xi1]], {xi1, 1, n}] Product[
    Abs[l[xi1] - l[xi2]], {xi1, 1, n}, {xi2, xi1 + 1, n}] Product[
    l[xi1]^((T - n - 1)/2), {xi1, 1, 
     n}]  Exp[-1/2 Sum[l[xi1], {xi1, 1, n}]], 
  Evaluate[Sequence @@ Table[{l[xi1], 0, Infinity}, {xi1, 1, n}]]]
NN[T, n] NIntegrate[
  Sum[Log[z DD[l, xi1]], {xi1, 1, n}] Product[
    Abs[z DD[l, xi1] - z DD[l, xi2]], {xi1, 1, n}, {xi2, xi1 + 1, 
     n}] Product[(z DD[l, xi1])^((T - n - 1)/2), {xi1, 1, 
     n}]  Exp[-1/2 z] z^(n - 1) , 
  Evaluate[Sequence @@ 
    Join[{{z, 0, Infinity}}, 
     Table[{l[xi1], If[xi1 == 1, 0, l[xi1 - 1]], 1}, {xi1, 1, 
       n - 1}]]]]
(n (Log[2] + PolyGamma[n T/2 + 0]) + 
  Sum[(dSS[j, n, T]/SS[n, T]), {j, 1, n}])
Out[862]= 3.24187
Out[863]= 3.24187
Out[864]= 3.24187
Out[865]= 3.24187 + 0. I

Yet as a by product of this computation we stumbled on an interesting identity:

\begin{eqnarray} \frac{\pi ^{\frac{n-1}{2}} \Gamma \left(\frac{n T}{2}\right)}{n! \left(\prod _{j=0}^{n-3} \Gamma \left(\frac{n-j}{2}\right)\right) \prod _{j=0}^{n-1} \Gamma \left(\frac{T-j}{2}\right)} \sum\limits_{j=1}^n {\mathfrak S}_j(n,T) = - n \psi^{(0)}(n T/2)+ \sum\limits_{j=T-n+1}^T \psi^{(0)}(p/2) \quad (ii) \end{eqnarray}

where
${\mathfrak S}_j(n,T) := \int\limits_{\Delta_{n-1}}\log(\Delta_j \lambda_j) \cdot \prod\limits_{1 \le \xi_1 < \xi_2 \le n} \left| \Delta_{\xi_1}\lambda_{\xi_1} - \Delta_{\xi_2} \lambda_{\xi_2} \right| \cdot \prod\limits_{\xi_1=1}^n (\Delta_{\xi_1} \lambda_{\xi_1} )^{\frac{T-n-1}{2}} \cdot \prod\limits_{\xi_1=1}^{n-1} d\lambda_{\xi_1} $ and $\Delta_i \lambda_i := 1_{i=n} + (1-1_{i=n})\lambda_i - \lambda_{i-1} 1_{i>1} $. Finally $\Delta_{n-1} := \left\{ \lambda_j | 0 \le \lambda_1 \le \lambda_2 \le \cdots \le \lambda_n \le 1 \right\} $ is a unit simplex .

My question is how do we prove identity $(ii)$ rigorously ?

Another puzzling identity that arose from integrating over eigenvalues of Wishart matrices.

0 Answers0