1

I was reading through this post on the expected dot product of two random vectors. The answers there are quite interesting but not very conclusive. I was wondering if someone knows if the question is any more clear cut with a few restrictions.

Let $C \in \mathbb R^{n \times n}$ be a PSD correlation matrix (with $-1 \leq C_{ij} \leq 1 \ \ \forall \ i,j$ and $C_{ii} = 1 \ \ \forall \ i$) and $\vec x \in \mathbb \{0,1\}^n$ a binary random vector with $k < n$ ones. Is there anything one can say (upper/lower bounds or even an exact expression) about the expected value of the product?

$$ \mathbb E (\vec x^T C \ \vec x) $$

Janosh
  • 127
  • 1
    The straightforward bounds are $-k^2 \le \vec x^T C \ \vec x \le k^2$ for all $\vec x \in \mathbb {0,1}^n$ with $k<n$ ones, and thereby these bounds also hold for the expected value. – IljaKlebanov Aug 06 '19 at 20:47

1 Answers1

1

Since $C$ is positive semi-definite, it has a Cholesky decomposition $C=LL^T$ with $L=(l_{ij})$. Thus $$\newcommand{Var}{\operatorname{Var}} E(x^TCx)=E((x^TL)(x^TL)^T)=\sum_{j=1}^nE(y_j^2)$$ where the $y_j=\sum_{i=1}^nl_{ij}x_i$ are also random variables. Now, denoting $D=\sum_{i=1}^nl_{ij}^2$ and $T=\sum_{1\le i<k\le n}l_{ij}l_{kj}$ so that $D+2T=\left(\sum_{i=1}^nl_{ij}\right)^2$, $$E(y_j^2)=E(y_j)^2+\Var(y_j)$$ $$=\left(\frac kn\sum_{i=1}^nl_{ij}\right)^2+\sum_{i=1}^nl_{ij}^2\Var(x_i)+2\sum_{1\le i<k\le n}l_{ij}l_{kj}\operatorname{cov}(x_i,x_k)$$ $$=\frac{k^2}{n^2}(D+2T)+\frac{k(n-k)}{n^2}D-\frac kn\cdot\frac{2(n-k)}{n(n-1)}T$$ $$=\frac kn\left(\frac kn(D+2T)+\frac{n-k} nD-\frac{2(n-k)}{n(n-1)}T\right)$$ $$=\frac kn\left(D+2\frac{k-1}{n-1}T\right)$$

Since only $l_{jj}$ and below can be non-zero, $$E(y_j^2)=\frac kn\left(\sum_{i=j}^nl_{ij}^2+2\frac{k-1}{n-1}\sum_{j\le i<k\le n}l_{ij}l_{kj}\right)$$ $$E(x^TCx)=\frac kn\sum_{j=1}^n\left(\sum_{i=j}^nl_{ij}^2+2\frac{k-1}{n-1}\sum_{j\le i<k\le n}l_{ij}l_{kj}\right)$$

Parcly Taxel
  • 103,344