Fenchel-Young inequality for matrices

Question

Background: For $f:D\subset\mathbb{R}^d \to \mathbb{R}$ we call $$f^*:D^*\subset\mathbb{R}^d \to \mathbb{R}, ~ y\mapsto \underset{x\in D}{\sup}(\langle y, x \rangle - f(x))$$ the conjugate of $f$, where $D^*:=\{y\in\mathbb{R}^d | \underset{x\in D}{\sup}(\langle y, x \rangle - f(x)) < \infty \}$. From the definition we directly get the Fenchel inequality: $$ \langle y, x \rangle \leq f(x) + f^*(y) $$ If we set $D=\mathbb{R}^d, f(x)=\lVert x \rVert$ for some vector norm $\lVert \cdot \rVert$ and consider the dual norm $ \lVert y \rVert^* = \underset{\lVert x \rVert=1}{\sup}\langle y, x \rangle$ (which in fact is a dual norm thanks to Riesz' representation theorem) we get $$ f^*(y) = \begin{cases} 0, &\lVert y \rVert^* \leq 1 \\ \infty, &\mathrm{otherwise}, \end{cases} $$ and therefore $\langle y, x \rangle \leq \lVert x \rVert$ whenever $\lVert y \rVert^* \leq 1$.

My questions: The author of a paper I'm reading defines something he calls a matrix dual norm by $$ \lVert W \rVert^* := \underset{\lVert u \rVert = \lVert v \rVert = 1}{\sup} u^TWv $$ for some vector norm $\lVert \cdot \rVert$ and $W\in\mathbb{R}^{d\times d}$. He then claims that $$ \langle A, B \rangle_{\mathrm{F}} \leq \lVert A \rVert\qquad (*) $$ by the Fenchel-Young inequality whenever $\lVert B \rVert^* \leq 1$, where $\langle \cdot, \cdot \rangle_{\mathrm{F}}$ is the Frobenius SP and I'm guessing (this is not clearly defined) $\lVert \cdot \rVert$ is the matrix norm induced by the aforementioned vector norm $\Vert \cdot \Vert$.

How is the defined matrix norm $\lVert \cdot \rVert^*$ a dual norm? I'm only familar with $\mathbb{R}^{d\times d}$ as a Hilbert space when equipped with the Frob. SP, but the dual norm defined here does not seem to coincide with the dual norm induced by the Frobenius SP.
Since the author does not use the Frobenius norm as dual norm, can we still apply Fenchel-Young to achieve the inequality (*)?

Thanks a lot!

Edit: I just realized that it might be somewhat exaggerated to use Fenchel-Young here, since for all $x \neq 0, y \in \mathbb{R}^d$ with $\lVert y \Vert^* \leq 1$ we have $$ \langle x, y \rangle = \langle x/\lVert x \rVert, y \rangle \lVert x \rVert \leq \lVert y \Vert^* \lVert x \rVert \leq \lVert x \rVert. $$ My second question therefore reduces to whether the following is true: $$ \langle A, B \rangle_F = \langle A/\lVert A \rVert, B \rangle_F \lVert A \rVert \overset{\mathrm{?}}{\leq} \lVert B \Vert^* \lVert A \rVert. $$ I'm pretty sure it is not since by choosing $\lVert \cdot \lVert = \lVert \cdot \lVert_1$ we have $$ \lVert B \Vert^* = \underset{\lVert u \rVert_1 = \lVert v \rVert_1 = 1}{\sup} u^TBv = \underset{i,j}{\max}|b_{i,j}| $$ but choosing $A$ appropriately (one entry per column +/-1, otherwise zero s.t. $\lVert A \rVert_1$ = 1) $$ \langle A, B \rangle_F = \sum_{j=1}^d \underset{i=1, ..., d}{\max} |b_{ij}| > \lVert B \Vert^* $$ Could someone confirm?

Assuming $\langle A,B\rangle_F = \operatorname{Tr}(A^\top B)$, then I believe the dual norm you are referring to is the nuclear norm. Check out appendix A.1.6. (p. 637) here: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf — V.S.e.H., Feb 04 '21 at 21:47
I don't completely follow. Do you mean that $\lVert \cdot \rVert^$ is the nuclear norm? I think it depends on the underlying vector norm $\lVert \cdot \rVert$. E.g. if we choose that to be $\lVert \cdot \rVert_1$ we get $\lVert A \rVert^ = \max|a_{i,j}|$. If we choose it to be $\lVert \cdot \rVert_2$ we get $\lVert A \rVert^* = \max|\lambda(A)|$, tha maximum absolute eigenvalue. — Xander, Feb 04 '21 at 22:06
The question could use some work. What exactly is the definition of $|\cdot|^{*}$? Which norm is $|v|$ (on vectors $v$)? Which norm is $|A|$ on matrices $A$? If you can't make that clear in your question, at least point to the page in the paper where you think these issues are addressed. — , Feb 05 '21 at 14:26
@PeterMorfe The definition of $\lVert \cdot \rVert^*$ is in the question, and which primal norm is used is of no importance. The question is about whether two definitions of a dual norm are equivalent for certain norms, and whether the Fenchel-Young inequality holds for the second definition. — V.S.e.H., Feb 05 '21 at 14:32
@PeterMorfe I partially agree with you in that I did not specify the matrix norm $\lVert A \rVert$, and this is part of my problem. But that isn't done in the paper either. I assume that to be the operator norm induced by the vector norm $\lVert \cdot \rVert$. All the paper's notation is introduced on p.3 and 4. — Xander, Feb 05 '21 at 14:54

V.S.e.H. · Answer 1 · 2021-02-05T18:44:01.157

2

If we define the dual norm on $\mathbb{R}^{m\times n}$ as $$ \lVert X \rVert^* = \sup\{\operatorname{Tr}(X^\top Y)~\vert~\lVert Y \rVert \leq 1\} = \sup\{\lvert\operatorname{Tr}(X^\top Y)\rvert~\vert~\lVert Y \rVert \leq 1\}, $$ thus $$ \langle X,Y\rangle_F = \langle X,Y/\lVert Y\rVert\rangle_F\lVert Y\rVert\leq \lVert X \rVert^*\lVert Y\rVert $$

Furthermore, we have an induced dual matrix norm $\lVert A \rVert^*$ equivalently defined as $$ \lVert A \rVert^* = \sup_{\lVert x\rVert^* = 1,\lVert y\rVert=1}x^\top Ay. $$

Proof: From the definition of dual norm on vectors: $$ \lVert z\rVert^* = \sup_{\lVert y \rVert=1}\lvert y^\top z\rvert. $$

Thus $$ \lVert A\rVert^* = \sup_{\lVert x\rVert^*=1}\lVert Ax\rVert^* = \sup_{\lVert x\rVert^*=1,\lVert y\rVert=1}\lvert y^\top Ax\rvert. $$

EDIT 1: To complement the answer a bit, we also have from Fenchel-Young on vectors $$ x^\top A^\top By\leq \lVert Ax\rVert^*\lVert By\rVert \leq \lVert A\rVert^*\lVert B\rVert \lVert x\rVert^*\lVert y\rVert. $$

The dual norm of the spectral norm $\lVert A \rVert_2$ is not itself. It is in fact the so-called nuclear norm, and a proof is available here: https://math.stackexchange.com/a/1145246/443030

$\lVert A\rVert_1$ is the dual of $\lVert A\rVert_\infty$, for matrices, and the inequality still holds.

edited Feb 05 '21 at 18:44

answered Feb 05 '21 at 13:55

V.S.e.H.

2,724

Thanks a lot, that really helped! I still think that for the dual norm defined in the paper, the inequality does not hold like that. (It is used in the proof of proposition 8. Since it is only used to find an upper bound, we can easily correct it by a factor.) Also, I think that your norm $\lVert A \rVert^$ is slightly different from the paper's, taking the supremum over $\lVert \cdot \rVert$ and $\lVert \cdot \lVert$ whereas the original takes the sup over $\lVert \cdot \rVert$ twice. It seems that's only the same for $\lVert \cdot \rVert = \lVert \cdot \rVert_2$ – Xander Feb 05 '21 at 14:58
You're welcome. It is very possible that they made a mistake by missing a $*$, happens all the times in papers, my self included. Indeed, they are the same for the Eucledian norm, but in that case the induced dual is in fact the spectral norm, whose dual is not itself. Also, I am still not convinced that the two definitions of the dual norm are equivalent. I will look a bit later, and maybe even try to prove it. – V.S.e.H. Feb 05 '21 at 15:04
What is the norm $Y \mapsto |Y|$ meant to be in your answer? The proof that $|\cdot|^{}$ is given by its second formulation never seems to return to the first formulation. (I can show that if $|\cdot| = |\cdot|_{}$ is the Euclidean norm, then $|\cdot|$ has to be the nuclear norm.) – Feb 05 '21 at 16:55
@PeterMorfe It can be any norm. Admittedly, I have not shown that the two definitions are equivalent, as I stated in the previous comment. – V.S.e.H. Feb 05 '21 at 17:56
It seems to come down on how this matrix norm is related to the vector norm. I think if $\lVert Y \rVert$ is induced by the vector norm $\lVert \cdot \rVert$, the dual norms are not equivalent in general. Taking $\lVert \cdot \rVert = \lVert \cdot \rVert_1$ would be a counterexample, giving $\lVert X \rVert^* = \sum_j \max_i |x_{i,j}|$ but $\lVert A \rVert^* = \max_j sum_i |a_{i,j}|$. – Xander Feb 05 '21 at 18:14
Also, it is apparent from other statements, that the author really intended to write $\lVert A \rVert^* = \underset{\lVert u \rVert = \lVert v \rVert = 1}{\sup} u^TAv$. In that case what seems to work for the dual norms to coincide (based on the examples $\lVert \cdot \rVert=\lVert \cdot \rVert_1$ and $\lVert \cdot \rVert=\lVert \cdot \rVert_{\infty}$ ) is to interpret the matrix $Y$ as a $d^2$-dimensional vector and inflict the same vector norm, i.e. for $\lVert \cdot \rVert=\lVert \cdot \rVert_{\infty}$ define $\lVert Y \rVert:=\max_{i,j}|y_{i,j}|$. – Xander Feb 05 '21 at 18:21
@Xander Indeed. If the primal norm is not an induced norm, i.e. it is a vector norm applied on matrix by flattening it into a vector, or it is a Schatten norm, then the two definitions should not be equivalent. Also, from the first definition we know that the dual of the spectral norm is not itself, but the nuclear norm. But in the second definition we clearly see the opposite. But we should be able to derive some Fenchel-Young-like bound consistent with both definitions? – V.S.e.H. Feb 05 '21 at 18:56
@bodil, what does "it can be any norm" mean? In general, if $|\cdot|$ is some matrix norm and I define $|\cdot|$ by $|X| = \sup \left{ \text{tr}(X^{T} Y) , \mid , |Y| \leq 1 \right}$, then $|Y| = \sup \left{\text{tr}(X^{T}Y) , \mid , |X| \leq 1 \right}$. That is, $|\cdot|$ in your post is uniquely determined by $|\cdot|^{}$ and vice versa. Unless you specify $|\cdot|$, your claim (and subsequent "proof") that $|X|^{}$ is the operator norm associated with the vector norm $|\cdot|^{*}$ is meaningless. – Feb 09 '21 at 18:14
@PeterMorfe Okay, since you're not reading my comments, I will update the answer. – V.S.e.H. Feb 09 '21 at 19:10

Fenchel-Young inequality for matrices

1 Answers1