3

Let $\mathcal S_+^d$ be set of real $d \times d$ symmetric positive semidefinite matrices and $\mathcal S_{++}^d$ be set of real $d \times d$ symmetric positive definite matrices

Following section 1.3 of the paper Quantum Optimal Transport for Tensor Field Processing (arXiv link, published not-open access version here) for $P \in \mathcal S_+^d$ and $(Q_i)_{i \in I} \subset S_+^d$ define $$ \exp\left(P + \sum_{i \in I} \log(Q_i) \right) $$ to be the matrix in $S_+^d$ with kernel $\sum_{i \in I} \ker(Q_i)$ and which is unambiguously defined on the orthogonal of that space.


Just using this definition, we get the following: for $x \in \ker(P)^{\perp} = \text{ran}(P^{\mathsf t}) = \text{ran}(P)$ there exists a $y \in \mathbb R^d$ with $x = P y$. As $P$ is symmetric, we can orthogonally diagonalize $P = U^{\mathsf t} D U$, where $D := \text{diag}(\lambda_1, \ldots, \lambda_d)$. Hence \begin{align*} \exp(\log(P)) x & = \exp\big(0_{d \times d} + \log(P)\big) P y = \sum_{k = 0}^{\infty} \frac{1}{k!} \log(P)^k \cdot P y \\ & = \sum_{k = 0}^{\infty} \frac{1}{k!} U^{\mathsf t} \log(D)^k U \cdot U^{\mathsf t} D U y \\ & = U^{\mathsf t} \left(\sum_{k = 0}^{\infty} \frac{1}{k!} \log(D)^k D \right) U y = U^{\mathsf t} \text{diag}\left(\left(\sum_{k = 0}^{\infty} \frac{1}{k!} \log(\lambda_j)^k \lambda_j \right)_{j = 1}^{d} \right) U y \\ & = U^{\mathsf t} \text{diag}\left(\left(\lambda_j^2 \right)_{j = 1}^{d} \right) U y = U^{\mathsf t} D^2 U y = P^2 y = P x. \end{align*} Hence $\exp(\log(P)) = P$.

This implies moreover that $\exp\left(\frac{1}{2} \log(P)\right) = \sqrt{P}$ for $P \in \mathcal S_+^d$, since $$\exp\left(\frac{1}{2} \log(P)\right) \cdot \exp\left(\frac{1}{2} \log(P)\right) = \exp(\log(P)) = P.$$


My question. In Example 2, for $P, Q \in S_+^d$ the "geometric mean" $$ M(P, Q) := \exp\left(\frac{1}{2} \log(P) + \frac{1}{2} \log(Q)\right) $$ is mentioned. (Side question: Is $M(P, Q) = P^{\frac{1}{2}} (P^{-\frac{1}{2}} Q P^{-\frac{1}{2}})^{\frac{1}{2}} P^{\frac{1}{2}}$, that is, is $M$ the "real" geometric mean?).
Now, as the matrix logarithm is not defined for singular matrices $P \in \mathcal S_{+}^d \setminus \mathcal S_{++}^d$, I don't know how to interpret this notation $M$.

Remark. We have $M(P, Q) = f(\sqrt{P}, \sqrt{Q})$ for the function $f(A, B) := \exp\big(\log(A) + \log(B)\big)$, which is discussed here. In particular $M(P, Q) = \sqrt{P Q}$ if $P Q = Q P$. By the Golden-Thompson inequality we have $\text{tr}\big(M(P, Q)\big) \le \text{tr}\big(\sqrt{P} \sqrt{Q}\big)$ for $P, Q \in \mathcal S_{++}^d$ and equality if and only if $P$ and $Q$ commute.

Another perspective: in the scalar case ($d = 1$), $M$ is, up to an additive constant of $-\ln(2)$, equal to the LogSumExp function.

Ideas. If $P, Q \in \mathcal S_{++}^d$ were positive definite, then (by the unicity of a PSD square root) we would have $\frac{1}{2} \log(P) = \log(\sqrt{P})$ and then $$ \exp\left(\frac{1}{2} \log(P) + \frac{1}{2} \log(Q)\right) = \exp\big( 0_{d \times d} + \log(\sqrt{P}) + \log(\sqrt{Q})\big) $$ could be interpreted in the above sense as the matrix in $\mathcal S_+^d$ with kernel $\ker(\sqrt{P}) + \ker(\sqrt{Q})$ and who, on $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp} = \ker(\sqrt{P})^{\perp} \cap \ker(\sqrt{Q})^{\perp}$, is equal to $\exp(0_{d \times d}) = \text{id}_{d \times d}$ and thus somehow an orthogonal projector onto $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp}$. For singular matrices $P, Q \in \mathcal S_+^d \setminus \mathcal S_{++}^d$ we could still define $M$ to be the orthogonal projector onto $\big( \ker(\sqrt{P}) + \ker(\sqrt{Q}) \big)^{\perp}$, but I don't know if this is a sensible definition.

Response to the answer by Igor Rivin. Is the following limit what you had in mind and is there a way to evaluate it? $$ \lim_{\varepsilon \searrow 0} \exp\left( \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right) = \exp\left( \lim_{\varepsilon \searrow 0} \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right), $$ where the equality is due to the continuity of the matrix exponential.

ViktorStein
  • 4,838

3 Answers3

3

I believe that the main principle is that $\exp(\frac12 \log x)= \sqrt{x}$ even for $x$ equal to $0.$ This is true by analytic continuation, and exactly the same idea works here - perturb your matrices slightly, and then check that the limit does not depend on the perturbation.

Igor Rivin
  • 25,994
  • 1
  • 19
  • 40
  • Thanks very much for your answer. But the for singular matrices I still don't have $\exp(...) = \sqrt{P} \sqrt{Q}$ if $P Q \ne Q P$, right? – ViktorStein Sep 20 '22 at 23:31
  • @Ramanujan No, but the analytic continuation at least gives you a definition that makes sense. How you actually compute it is a separate matter... – Igor Rivin Sep 21 '22 at 02:20
  • One of the reasons I ask is that in the paper the authors are not sure if $$D(P, Q) := \sqrt{\text{tr}\big(P + Q - 2 M(P, Q)\big)}$$ is a metric (that is, whether it satisfies the triangle inequality) and if I want to prove / disprove that, I have to know how to handle the $M$ term. – ViktorStein Sep 21 '22 at 10:37
  • @Ramanujan Well, can you deal with the case where $P, Q$ are non-singular? – Igor Rivin Sep 21 '22 at 14:06
  • Also no, but I hope that if the triangle inequality would fail, then for P and / or Q being singular – ViktorStein Sep 21 '22 at 18:42
  • @Ramanujan That is not going to happen. Can you do it for $P$ and $Q$ commuting? – Igor Rivin Sep 21 '22 at 21:40
  • Yes, if $P Q = Q P$ and $P, Q \in \mathcal S_{++}^d$, then $D(P, Q) = | \sqrt{P} - \sqrt{Q} |{F}$ (Frobenius norm), so this is a distance. Furthermore, the Golden-Thompson inequality shows that for $P, Q \in \mathcal S{++}^d$, $D(P, Q) \ge | \sqrt{P} - \sqrt{Q} |_F$. – ViktorStein Sep 22 '22 at 06:36
  • Why is that not going to happen? – ViktorStein Sep 22 '22 at 08:30
  • (Btw the link to your CV on your website is down) – ViktorStein Sep 22 '22 at 09:15
  • Is the following limit what you had in mind and is there a way to evaluate it? $$ \lim_{\varepsilon \searrow 0} \exp\left( \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right) = \exp\left( \lim_{\varepsilon \searrow 0} \frac{1}{2} \log(P + \varepsilon I) + \frac{1}{2} \log(Q + \varepsilon I)\right), $$ where the equality is due to the continuity of the matrix exponential. – ViktorStein Dec 10 '22 at 20:32
  • I edited my question to show using the definition in the paper that we must have $\exp(\log(P)) = P$ for any symmetric matrix, as you suggest. – ViktorStein Dec 13 '22 at 00:08
1

The following is pure speculation on my part as to how one might define the geometric mean for matrices which are not positive definite...

First, the log-based $M(P,Q)\,$ is not the same as the conventional matrix geometric mean $$ G(P, Q) := P^{\frac{1}{2}} \left(P^{-\frac{1}{2}} Q P^{-\frac{1}{2}}\right)^{\frac{1}{2}} P^{\frac{1}{2}} $$ for non-commuting matrices.

The conventional definition requires that at least one of the matrices be invertible. The log-based definition seems to require that both matrices are invertible.

Note that the Arithmetic-Harmonic Mean (AHM) is equal to the geometric mean for non-zero scalars. This suggests the following algorithm for the matrix case $$\eqalign{ A_0 &= P \quad &H_0 = Q \\ A_{k+1} &= \tfrac 12(A_k+H_k) \qquad &H_{k+1} = 2\big(A_k^++H_k^+\big)^+ \\ }$$ The iteration converges $\big(A_\infty=H_\infty\big)\,$ and doesn't require matrix inverses or logarithms or square roots. Instead it uses pseudoinverses which are well-defined for semidefinite matrices. The algorithm is also unaffected if $P$ and $Q$ are swapped.

Update

Generating small random matrices ($P$ is definite and $Q$ is semi-definite and $P Q \ne Q P$) $$\eqalign{ P &= \left[ \begin{array}{ccc} 70 & 64 & 73 \\ 64 & 93 & 71 \\ 73 & 71 & 98 \\\end{array} \right],\qquad\qquad Q = \left[ \begin{array}{ccc} 113 & 63 & 111 \\ 63 & 50 & 51 \\ 111 & 51 & 117 \\ \end{array} \right] \\ \\ G(P,Q) &= \left[ \begin{array}{ccc} 81.4967 & 50.6175 & 76.2631 \\ 50.6175 & 47.0073 & 35.975 \\ 76.2631 & 35.975 & 79.7012 \\ \end{array} \right] \\ \\ M(P,Q) &= \left[ \begin{array}{ccc} 97.7269-0.1884\mathit{i} & 65.5237-0.0035\mathit{i} & 91.3427-0.0008\mathit{i} \\ 65.5237-0.0035\mathit{i} & 59.8803+0.0988\mathit{i} & 49.7925+0.1196\mathit{i} \\ 91.3427-0.0008\mathit{i} & 49.7925+0.1196\mathit{i} & 93.5964+0.1455\mathit{i} \\ \end{array} \right] \\ \\ A(P,Q) &= \left[ \begin{array}{ccc} 84.0641 & 58.0756 & 84.0195 \\ 58.0756 & 71.2391 & 61.8203 \\ 84.0195 & 61.8203 & 107.4125 \\ \end{array} \right] \quad \Big\{6{\rm\;iterations:\;}\varepsilon\le 10^{-14}\Big\} \\ \\ }$$ So $\;M(P,Q) \;\ne\; G(P,Q) \;\ne\; A(P,Q)$.

When $P$ and $Q$ are both full-rank, the results become
$\quad M(P,Q) \;\ne\; G(P,Q) \;\equiv\; A(P,Q)$.

Update 2

When $PQ=QP\,$ numerical experiments indicate that when both matrices are singular $$G(P,Q) \;\ne\; M(P,Q) \;=\; A(P,Q)$$ otherwise $$G(P,Q) \;=\; M(P,Q) \;=\; A(P,Q)$$

To summarize, numerically there appear to be six cases for semi-definite matrices $$\eqalign{ \def\m#1{\left[\begin{array}{r|r|r|l}#1\end{array}\right]} \m{ &P^{-1}\,\&\&\,Q^{-1} &P^{-1}\,\|\,Q^{-1} &{\rm neither\:exists} \\ \hline PQ=QP & M=A=G & M=A=G & M=A \\ PQ\ne QP & A=G & & \\ } }$$

greg
  • 35,825
  • @Ramanujan I found examples quite readily and appended one of them to the post. – greg Dec 11 '22 at 12:46
  • Oops, I did not account for the commutation constraint in my testing. Note that AHM equals $G$ for definite matrices without the commutation constraint. – greg Dec 11 '22 at 13:14
  • Could you please define AHM and $^{+}$? I thought $AHM(x, y) = \frac{2}{\frac{1}{x} + \frac{1}{y}}$ for scalars $x, y$ and $P^+$ is the Moore-Penrose pseudo-inverse of $P$. – ViktorStein Dec 11 '22 at 13:24
  • And your method is based on the fact that in the scalar case, for $a_0, b_0 > 0$ the iterations $a_{n + 1} = \frac{a_n + b_n}{2}$ and $b_{n + 1} = \frac{2}{\frac{1}{a_{n + 1}} + \frac{1}{b_n}}$ generate sequences with common limit $\lim_{n \to \infty} a_n = \lim_{n \to \infty} b_n = \sqrt{a_0 b_0}$? – ViktorStein Dec 11 '22 at 14:04
  • That's correct. And AGM-like iterates converge quickly, so it can be useful even in the scalar case, e.g. setting $a=1$ produces a serviceable algorithm for $\sqrt b;$ – greg Dec 11 '22 at 14:23
  • So in your calculation $AHM(P, Q)$ is the result of applying the iteration you proposed to $P$ and $Q$ for a finite number of steps? How do you know, that in general, AHM = G for full rank (that is, positive definite, right?) matrices? – ViktorStein Dec 11 '22 at 14:47
  • Does your definition coincide with what I proposed about the orthogonal projector? – ViktorStein Dec 11 '22 at 14:50
  • And do you know if your definition coincides with the limit definition I gave at the end of my question? – ViktorStein Dec 12 '22 at 23:18
  • Sorry, but I don't know if the iteration converges to your limit definition or not. I should also warn you that the iteration often requires extended precision. In Julia that's as simple as using GenericLinearAlgebra and BigFloat matrices. – greg Dec 13 '22 at 02:35
0

Another (Lie algebra) perspective might be to define $$ \exp\left( \frac{1}{2} \log(P) + \frac{1}{2} \log(Q)\right) := \lim_{n \to \infty} \left(P^{\frac{1}{2n}} Q^{\frac{1}{2n}}\right)^{n}, $$ since $\exp\big(\log(A) + \log(B)\big) = \lim_{n \to \infty} \left( A^{\frac{1}{n}} B^{\frac{1}{n}}\right)^n$ holds for all symmetric positive definite matrices, see e.g. here.

ViktorStein
  • 4,838