5

Let $A,B$ have the appropriate size. How can we show the von Neumann trace inequality?

$$ \mbox{Tr}(AB) \leq \sum_{i=1}^n \sigma_{A,i}\sigma_{B,i} $$

Also, what is the intuition behind this inequality?

2 Answers2

10

One may view the trace inequality as a pile of Cauchy-Schwarz inequalities.

Let us tackle the (more general) complex case here. Using the tracial property and singular value decomposition, one can reduce the inequality $|\operatorname{tr}(AB)|\le\sum_i\sigma_i(A)\sigma_i(B)$ to $$ |\operatorname{tr}(DUSV^\ast)|\le\operatorname{tr}(DS)\tag{1} $$ where $U,V$ are two unitary matrices and $D=\operatorname{diag}(d_1,\ldots,d_n),\,S=\operatorname{diag}(s_1,\ldots,s_n)$ are two diagonal matrices with nonnegative and decreasing diagonal entries. Let $P_k$ denotes the orthogonal projection matrix $I_k\oplus0_{n-k}$. Note that $D$ is a non-negatively weighted combinations of the $P_i$s. In fact, $$ D=(d_1-d_2)P_1+\cdots+(d_{n-1}-d_n)P_{n-1}+d_nP_n $$ and similarly for $S$. For convenience, let us write $D=\sum_ka_kP_k$ and $S=\sum_lb_lP_l$, where the $a_k$s and $b_l$s are nonnegative. The inequality $(1)$ thus becomes $$ \left|\sum_{k,l}a_kb_l\operatorname{tr}(P_kUP_lV^\ast)\right| \le\sum_{k,l}a_kb_l\operatorname{tr}(P_kP_l).\tag{2} $$ So, by triangle inequality, it suffices to prove that $$ \left|\operatorname{tr}(P_kUP_lV^\ast)\right| \le\operatorname{tr}(P_kP_l)\tag{3} $$ for each pair of $k$ and $l$. Assume that $k\ge l$, or else interchange the roles of $k$ and $l$. As $P_kUP_l=[P_ku_1,\,\ldots,\,P_ku_l,\,0,\ldots,0]$, the inequality $(3)$ is equivalent to $$ \left|\sum_{i=1}^l \langle P_ku_i,\,v_i\rangle\right|\le l.\tag{4} $$ Since $P_k$ is an orthogonal projection and the columns of $U$ are unit vectors, $\|P_ku_i\|_2\le1$. Therefore $(4)$ follows from Cauchy-Schwarz inequality.

user1551
  • 139,064
  • 1
    Why $\operatorname{tr}(USV^TD)\le\operatorname{tr}(SD)$ is true? Could you prove this? –  Jan 16 '19 at 22:03
  • I do not understand how you have changed the Von Nuemann to your trace inequality. I cannot see the steps? Could you elaborate that? –  Jan 16 '19 at 22:11
  • 1
    I stuck at this point: "Using the tracial property and singular value decomposition, one can reduce the inequality", How? I mean, How can you get this $|\operatorname{tr}(USV^\ast D)|\le\operatorname{tr}(SD)$ from this $|\operatorname{tr}(AB)|\le\sum_i\sigma_i(A)\sigma_i(B)$? Let $A=U_ASV_A^\ast$ and $B=U_BDV_B^\ast$, so $|\operatorname{tr}(U_ASV_A^\ast U_BDV_B^\ast)|=|\operatorname{tr}(V_B^\ast U_ASV_A^\ast U_BD)|$. Is $U=V_B^\ast U_A$ and $V=U_B^\ast V_A$. Is that what you mean? –  Jan 16 '19 at 22:45
  • Where does this come from? $S=(s_1-s_2)P_1+\cdots+(s_{n-1}-s_n)P_{n-1}+s_nP_n$ –  Jan 16 '19 at 22:52
  • @ user1551: Ok. Please explain it to me later. –  Jan 16 '19 at 23:00
  • @Saeed Just add the matrices directly. E.g. when $n=3$, we have $\operatorname{diag}(s_1,s_2,s_3)=(s_1-s_2)\operatorname{diag}(1,0,0)+(s_2-s_3)\operatorname{diag}(1,1,0)+s_3\operatorname{diag}(1,1,1)$. – user1551 Jan 17 '19 at 03:39
  • @user1551 your proof is amazing :) – weirdo Sep 29 '19 at 06:04
  • Very nice proof. It also shows that the inequality is strict for non-commuting square positive-definite matrices. – a06e Feb 14 '20 at 14:52
4

A different (non-standard) proof:

Consider the SVD of $A$, with singular values $\sigma_i(A)$ and associated orthonormal bases $e_i$, $\tilde{e}_i$; similarly for $B$, with bases $f_j$, $\tilde{f}_j$. Then \begin{align} \mathrm{tr}(A^*B)&=\mathrm{tr}\langle Ae_i,Be_i\rangle\\ &=\sum_i\sigma_i(A)\langle\tilde{e}_i,Be_i\rangle\\ &=\sum_{ij}\sigma_i(A)\langle\tilde{e}_i,\tilde{f}_j\rangle\langle \tilde{f}_j,Be_i\rangle \\ &=\sum_{ij}\sigma_i(A)\sigma_j(B)\langle\tilde{e}_i\tilde{f}_j\rangle\langle f_j,e_i\rangle\\ |\mathrm{tr}(A^*B)|&\le\sum_{ij}\sigma_i(A)\sigma_j(B)(|\langle\tilde{e}_i,\tilde{f}_j\rangle|^2+|\langle f_j,e_i\rangle|^2)/2\\ &=\vec{\sigma}(A)^*\,T\,\vec{\sigma}(B) \end{align} $T$ is a doubly stochastic matrix since $$\sum_iT_{ij}=\tfrac{1}{2}\sum_i|\langle\tilde{e}_i,\tilde{f}_j\rangle|^2+\tfrac{1}{2}\sum_i|\langle f_j,e_i\rangle|^2|=\tfrac{1}{2}\|\tilde{f}_j\|^2+\tfrac{1}{2}\|f_j\|^2=1$$ (and similarly for columns).

But by Birkhoff's theorem, every doubly stochastic matrix is the convex sum of permutation matrices, $$T=\sum_k\alpha_kP_k,\qquad\qquad \sum_k\alpha_k=1,$$ hence $$|\mathrm{tr}(A^*B)|\le\sum_k\alpha_k(\vec{\sigma}(A)^*P_k\vec{\sigma}(B))\le\max_k\vec{\sigma}(A)^*P_k\vec{\sigma}(B)$$ By the rearrangement inequality, the largest sum of $(\sigma_i(A))$ dotted with the permuted values of $(\sigma_i(B))$ is the one in which the values align from biggest to smallest. This is precisely the von Neumann trace inequality.

Chrystomath
  • 10,798