3
  1. Is $\mbox{Tr}\left(X X^T X X^T\right)$ a convex function of arbitrary real matrix $X$?

  2. More generally, is $\mbox{Tr}\left(\left(X X^{\dagger}\right)^m\right)$ a convex function of arbitrary complex matrix $X$ for any integer $m \ge 1$?

Any advice or suggestions would be greatly appreciated.


The proof hint:

Let us apply SVD to a matrix $X$: $X$ = $U D V^{\dagger}$. Every matrix has SVD with non-negative singular values on the main diagonal of $D$. Next:

$\left(X X^{\dagger}\right)^m$ = $U D V^{\dagger} V D U^{\dagger} U D V^{\dagger} V D U^{\dagger} \ldots U D V^{\dagger} V D U^{\dagger}$ = $U D^{2 m} U^{\dagger}$.

Note, all unitary matrices are cancelled in between, because $U^{\dagger}U = V^{\dagger}V = I$.

$\mbox{Tr} \left(\left(X X^{\dagger}\right)^m\right)$ = $\mbox{Tr} \left(U D^{2 m} U^{\dagger}\right)$ = $\mbox{Tr} \left(D^{2 m} U^{\dagger} U\right)$ = $\mbox{Tr} \left(D^{2 m}\right)$ = $\sum_i \sigma_i^{2 m}$, where we used the cyclic property of trace operation, and $\{\sigma_i\}$ are the singular values of matrix $X$.

Let $\{x_i\}$, $\{y_i\}$ and $\{z_i\}$ be the singular values of arbitrary matrices $X$, $Y$ and their convex combination $Z$ = $\alpha X + (1 - \alpha) Y$ respectively. It was shown below by @PSL that for the Frobenius norm ($m = 1$) the following holds true:

$\alpha \sum_i x_i^2 + (1 - \alpha) \sum_i y_i^2 \ge \sum_i z_i^2$.

Considering that the function $\phi: x \rightarrow x^m, x \in R^+$ is convex, would it be possible to show that the case $m > 1$ is also satisfied:

$\alpha \sum_i x_i^{2 m} + (1 - \alpha) \sum_i y_i^{2 m} \ge \sum_i z_i^{2 m}$ ?

Update: by numerical simulation I found that $\sum_i x_i^{2} \ge \sum_i z_i^{2}$ does not necessarily entails $\sum_i x_i^{4} \ge \sum_i z_i^{4}$ on roughly 9% of random configurations. Seems like this line of thoughts does not work. However, the extended brute force simulation still succeeds for $m$ 2 to 5.


Brute force approach to answer the questions. Here I literally check convexity on random matrices. The Python code speaks for itself:

import numpy as np

tol = 10.0 * np.finfo(float).eps count_ok, count_fail = int(0), int(0)

for m in range(2, 5 + 1): print("m:", m) for dim in range(2, 10 + 1): print(f"matrix size: {dim}x{dim}") for test in range(100000): X = 2 * np.random.rand(dim, dim) - 1 Y = 2 * np.random.rand(dim, dim) - 1 XXt = X @ X.T YYt = Y @ Y.T for t in np.linspace(0.01, 0.99, 20): Z = X * t + Y * (1 - t) ZZt = Z @ Z.T ok = (np.trace(np.linalg.matrix_power(ZZt, m)) <= np.trace(np.linalg.matrix_power(XXt, m)) * t + np.trace(np.linalg.matrix_power(YYt, m)) * (1 - t) + tol) if ok: count_ok += 1 else: count_fail += 1

print(f"succeeded: {count_ok} times") print(f"failed: {count_fail} times") print("") ....... succeeded: 72,000,000 times failed: 0 times

  • What did you try to solve the problem? – Arctic Char Nov 14 '21 at 21:33
  • Sorry, I did not get the question. – Albert65 Nov 14 '21 at 21:40
  • 2
    The question of @Arctic Char is crystal clear : Have you made some previous Web searching, have you made some computational attempts, for example computations entrywise in dimension 2 or 3 ? etc. – Jean Marie Nov 14 '21 at 22:02
  • What I tried beforehand was quite naive. For now, I realised which direction to move on, thanks to @Bananach. – Albert65 Nov 14 '21 at 22:35
  • 1
    Brute force simulation with random matrices of size 2x2 to 10x10 favours the "yes" answer to the first question (1.800.000 trials have been made in total). – Albert65 Nov 14 '21 at 23:57
  • @Albert65 Could you say more about this simulation in an answer that I will surely upvote ? This kind of approach is alas still too rare among "pure mathematicians" and it would help the OP. – Jean Marie Nov 15 '21 at 08:46
  • Connected interesting properties here – Jean Marie Nov 15 '21 at 08:51
  • @JeanMarie my brute force "solution" is not a proof by no means, rather a hint that the answer might be "yes". – Albert65 Nov 15 '21 at 16:11
  • The SVD idea is nice, but I find it a bit problematic to use the singular values as the variables, when you're interested in showing the convexity in the variable $X$. I might be missing something obvious, but would still be a bit careful there. – PSL Nov 15 '21 at 18:16

3 Answers3

3

Your problem reduces to showing that $(XX^T)^m$ is convex in the matrix sense. Recall that a matrix valued function $f$ is convex, if $\alpha f(X) + (1-\alpha) f(Y) \succeq f(\alpha X + (1-\alpha)Y)$.

This is because $A \succeq B$ implies $\textrm{Tr}(A) \geq \textrm{Tr}(B)$.

For $m = 1$ this is not too difficult:

Claim 1. $AA^T\succeq 0$.

Proof. $v^TAA^Tv = \|Av\|^2_2 \geq 0$ for every $v$.

Claim 2. $f(X) = XX^T$ is convex in the matrix sense.

Proof. Compute $$ \alpha f(X) + (1-\alpha) f(Y) - f(\alpha X + (1-\alpha)Y) = \alpha(1-\alpha)XX^T + \alpha(1-\alpha)YY^T - \alpha(1-\alpha)(XY^T + YX^T). $$ We want to show that the above is $\succeq 0$, which happens if and only if $XX^T + YY^T - (XY^T + YX^T) \succeq 0$, as $\alpha \in (0,1)$. But this is just $(X - Y)(X - Y)^T$, which is positive semidefinite by Claim 1. Hence, $\alpha f(X) + (1-\alpha) f(Y) \succeq f(\alpha X + (1-\alpha)Y)$.

PSL
  • 146
1

By https://en.m.wikipedia.org/wiki/Trace_inequality the function $A\mapsto \text{tr} f(A)$ is convex if $f\colon \mathbb{R}\to\mathbb{R}$ is.

Bananach
  • 7,934
1

Yes, $X\mapsto\operatorname{tr}((XX^\ast)^m)$ is convex for any integer $m\ge1$.

The function is a composition of the inner function $h:M_n(\mathbb C)\ni X\mapsto A=XX^\ast\in\mathbb S_+$ (where $\mathbb S_+$ denotes the set of all positive semidefinite matrices) and the outer function $g:\mathbb S_+\ni A\mapsto\operatorname{tr}(A^m)\in\mathbb R$. The matrix-valued inner function $h$ is convex because $$ \theta h(X)+(1-\theta)h(Y)-h\left(\theta X+(1-\theta)Y\right) =\theta(1-\theta)(X-Y)(X-Y)^\ast $$ is positive semidefinite. The outer function $g$ is convex because it is in the form of $g(A)=\operatorname{tr}(f(A))$ (see footnote below), where $f(x)=x^m$ is a continuous convex function on the positive reals. Clearly $g(A)=\operatorname{tr}(A^m)$ is also (weakly) increasing on $\mathbb S_+$ (although $A\mapsto A^m$ is not increasing on $\mathbb S_+$ in general). Therefore $g\circ h$ is a composition of a convex inner function and a convex increasing outer function. Hence it is convex.

Footnote.

As pointed out by the Wikipedia article linked by Bananach's answer here, a proof of the result that $f$ is continuous and convex implies $A\mapsto\operatorname{tr} f(A)$ is convex can be found, for instance, in the 2009 paper Trace inequalities and quantum entropy: an introductory course by Eric Carlen. Although the domain and codomain of the $f$ in this paper are different from ours (reals vs positive reals and Hermitian vs PSD), the same proof applies.

user1551
  • 139,064
  • Thank you very much @user1551. Very concise and clear explanation. I tried a similar way but was unable to show that $g(A)$ is increasing. Thanks for the reference. For anyone who is interested why $g(A)$ should be increasing see, for example, link this answer, where inequalities should be replaced by matrix definiteness relations. – Albert65 Nov 18 '21 at 22:41