3

Let $X \in \mathbb{R}^{a \times b}$ and

$$\|X\|_2 = \sigma_{\max}(X) = \sqrt{\lambda_{\max} \left( X^T X \right)}$$

How can I compute $\nabla_X \|AX\|_2$, where $A \in \mathbb{R}^{c \times a}$ is some known matrix?

pulosky
  • 665

1 Answers1

3

Consider a matrix and its SVD $$Y = \sum_{k=1}^r\sigma_ku_kv_k$$ and let $\,\phi=\|Y\|=\sigma_1\,$ be the spectral norm $($assuming that the singular values are ordered such that $\sigma_1>\sigma_2>\sigma_3>\ldots>\sigma_r>0\,)$

The gradient of the norm is $$\frac{\partial\phi}{\partial Y} = u_1v_1^T$$

Write the differential in terms of this gradient and perform a change of variables $Y=AX$ $$\eqalign{ d\phi &= u_1v_1^T:dY \cr &= u_1v_1^T:A\,dX \cr &= A^Tu_1v_1^T:dX \cr \frac{\partial\phi}{\partial X} &= A^Tu_1v_1^T \cr }$$ to obtain the desired gradient.

A colon is used to denote the trace/Frobenius product, i.e. $A:B={\rm tr}(A^TB),\,$ in some of the steps above.

If the first few singular values are identical, $($e.g. $\sigma_1=\sigma_2=\sigma_3)$, then the result changes slightly $$\eqalign{ \frac{\partial\phi}{\partial X} &= \sum_{k=1}^3A^Tu_kv_k^T \cr }$$

greg
  • 35,825
  • Could you please expand on why $\frac{\partial \phi}{\partial Y}=u_1 v_1^T$? – pulosky May 27 '18 at 00:23
  • I'll refer you to this question for the details. – greg May 27 '18 at 00:48
  • @greg , $||AX||_2$ admits a derivative essentially when the multiplicity of the largest singular eigenvalue is locally constant. Otherwise, cf. the case when $U=diag(t+2,2t+2)$ and $t0=0$. –  Apr 13 '20 at 17:25
  • @greg , if $\sigma_1=\sigma_2=\sigma_3$ in a point, then, in general, $\sigma_1$ has no derivative. cf. my post and my comment in https://math.stackexchange.com/questions/3601351/gradient-of-a-mapsto-sigma-i-a/3630189#3630189 –  Apr 18 '20 at 21:40