1

I am reading Characterization of the Subdifferential of Some Matrix Norms by G.A. Watson. And in the first page the subgradient of $\|A\|$ is defined:$$\partial\|A\| := \{G\in \mathbb{R}^{m \times n}:\|B\|\geq\|A\| + \operatorname{tr}((B-A)^\top G),\forall B \in \mathbb{R}^{m \times n}\}$$ And he then wrote $G\in \partial\|A\|$ is equivalent to $$\|A\| = \operatorname{tr}(G^\top A) \quad \text{and}\quad\|G\|_* = \max_{||B||\leq 1}\operatorname{tr}(B^\top G)\leq 1$$ But I can't really see how the equivalence is established. Any hint would be greatly appreciated.

Theo Bendit
  • 50,900
Cris
  • 367
  • 1
  • 10

2 Answers2

4

Suppose $G \in \partial \|A\|$. Then, considering $B = 0$, we get $$0 \ge \|A\| + \operatorname{tr}((0 - A)^\top G) \implies \operatorname{tr}(G^\top A)=\operatorname{tr}(A^\top G)\ge \|A\|.$$ Considering instead $B = 2A$, we also get $$2\|A\| \ge \|A\| + \operatorname{tr}(A^\top G) \implies \operatorname{tr}(G^\top A) \le \|A\|.$$ Thus, $\operatorname{tr}(G^\top A) = \|A\|$. Substituting back into the definition, this implies, for all $B \in \Bbb{R}^{m \times n}$, $$\|B\| \ge \|A\| + \operatorname{tr}(B^\top G) - \operatorname{tr}(A^\top G) = \operatorname{tr}(B^\top G),$$ which yields $\|G\|_* \le 1$ as required.

The converse proceeds just as above. If $G$ satisfies the two conditions, then as above, the inequality that determines membership in $\partial \|A\|$ simplifies to $$\|B\| \ge \operatorname{tr}(B^\top G).$$ If we consider $B' = B / \|B\|$ (consider the $B = 0$ case separately!), then since $\|G\|_* \le 1$ and $\|B'\| \le 1$, we get $$\operatorname{tr}((B')^\top G) \le \|G\|_* \le 1.$$ But this means $$1 \ge \frac{\operatorname{tr}(B^\top G)}{\|B\|} \implies \operatorname{tr}(B^\top G) \le \|B\|,$$ as needed. Thus $G \in \partial \|A\|$.

Theo Bendit
  • 50,900
0

In fact it can be shown when $A \neq 0$, $G\in \partial \|A\|$ if and only if (i) $\|G\|_* =1$;(ii) $\langle G,A\rangle=\|A\|$. Because from $\|G\|_*\leq 1$ and $\langle G,A\rangle=\|A\|$ we have $\|A\|=\langle G,A\rangle \leq \|G\|_* \|A\|\leq\|A\| \Longrightarrow\|G\|_*= 1$. And the reverse is obvious.

Moreover, I think the proof of Theorem 1 in Characterization of the Subdifferential of Some Matrix Norms is flawed. Since the singular value of a matrix is not always differentiable. Take $M(t) =t*I\in R^{n \times n}$ as an example. We have $\sigma(M(t))=|t|$ which is not differentiable at $t=0$.

The proof in The Convex Analysis of Unitarily Invariant Matrix Functions a better reference.

Yunfei
  • 135