3

Fix $A\in M_n(\mathbb{C})$ and let $f : M_{nm}(\mathbb{C}) \to \mathbb{R}$ be defined as $f(S) = \tfrac{1}{2}\|A - S\,S^\mathsf{T}\|_\mathsf{F}^2$ (yes I do mean the transpose and not the adjoint). I want to compute $\frac{\partial f}{\partial S}$. From the Matrix Cookbook 2.8.1 (link) we could use the chain rule $$\frac{\partial f}{\partial S_{ij}} = -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})\frac{\partial S S^\mathsf{T}}{\partial S_{ij}}\right].$$

Working with this we can see $$\frac{\partial S S^\mathsf{T}_{kl}}{\partial S_{ij}} = \begin{cases}2 s_{ij} & \text{if } k=l=i\\s_{lj} & \text{if }k=i\neq j\\s_{kj} & \text{if }j=i\neq k\\ 0 & \text{otherwise}\end{cases}$$

so $$\frac{\partial S S^\mathsf{T}}{\partial S_{ij}} = S_je_i^\mathsf{T} + e_iS_j^\mathsf{T}$$ where $S_j$ is the $j$-th column of $S$. It follows that $$\frac{\partial f}{\partial S_{ij}} = -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})(S_je_i^\mathsf{T} + e_iS_j^\mathsf{T})\right].$$

How can I simplify this further? Ideally I would have a closed form expression for $f'(S)$ without having to index with coordinates.

cdipaolo
  • 1,146

3 Answers3

3

Using the double-contraction product, i.e. $$A:B={\rm tr}(A^TB)$$ and the auxilliary variable $$M=SS^T-A$$ you can write the function, its differential & gradient very succinctly $$\eqalign{ f &= \frac{1}{2}\,M:M \cr df &= M:dM = M:(dS\,S^T+S\,dS^T) \cr &= MS:dS + S^TM:dS^T \cr &= (M+M^T)S:dS \cr \frac{\partial f}{\partial S} &= (M+M^T)S = (2SS^T-A^T-A)S \cr }$$

frank
  • 541
1

To begin, write $$ -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})(S_je_i^\mathsf{T} + e_iS_j^\mathsf{T})\right] = -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})S_je_i^\mathsf{T}\right] -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})e_iS_j^\mathsf{T})\right] $$ Now, note that $$ -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})S_je_i^\mathsf{T}\right] = -\mathsf{tr}\,\left[e_i^\mathsf{T}(A^\mathsf{T} - SS^\mathsf{T}S_j\right] \\ = -e_i^\mathsf{T}(A^\mathsf{T} - SS^\mathsf{T})S_j \\ = -e_i^\mathsf{T}(A^\mathsf{T} - SS^\mathsf{T})(S e_j) \\ = -[(A^\mathsf{T} - SS^\mathsf{T})S]_{i,j} $$ Similarly, $$ -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})e_iS_j^\mathsf{T}\right] = -[S^T(A^T - SS^T)]_{j,i} = -[(A - SS^T)S]_{i,j} $$ All together, we have $$ \frac{\partial f}{\partial S_{i,j}} = -\mathsf{tr}\,\left[(A^\mathsf{T} - SS^\mathsf{T})(S_je_i^\mathsf{T} + e_iS_j^\mathsf{T})\right] = -[(A + A^T - 2SS^T)S]_{i,j} $$ Your matrix form for $f'(S)$ depends which convention you follow, so I'll leave the rest to you.

Ben Grossmann
  • 225,327
  • Let me know if any of these steps need clarification; I was pretty terse since I just wanted to get to an answer. – Ben Grossmann Jun 20 '17 at 03:17
  • 1
    Awesome thanks for such a quick response! :) – cdipaolo Jun 20 '17 at 03:29
  • What @Omnomnomnom has is correct, but since I like matrix derivations better, I added an answer deriving it differently for the whole matrix $S$. – Alt Jun 20 '17 at 18:45
1

\begin{equation} \begin{aligned} \|A - S\,S^\mathsf{T}\|_\mathsf{F}^2 &= tr((A-SS^T)^T(A-SS^T)) \\ &= tr(A^TA)-2 tr(A^TSS^T)+tr(SS^TSS^T) \end{aligned} \end{equation}

\begin{equation} \begin{aligned} \frac{\partial f(S)}{\partial S}&=\tfrac{1}{2}(\frac{\partial}{\partial S} tr(A^TA)-2\frac{\partial}{\partial S}tr(A^TSS^T)+\frac{\partial}{\partial S} tr(SS^TSS^T)) \\ &= \tfrac{1}{2}(0-2\frac{\partial tr(A^TSS^T)}{\partial S}+\frac{\partial tr(SS^TSS^T)}{\partial S}) \\ &\text{with some abuse of notation:}\\ &=\tfrac{1}{2}( 2\frac{ tr(\partial(A^TSS^T))}{\partial S}+\frac{tr(\partial(SS^TSS^T))}{\partial S}) \\ &\text{by circularity of trace norm:}\\ &= \tfrac{1}{2}(2\frac{ tr(S^TA^T\partial S)+tr(A^TS\partial S^T)}{\partial S}+\frac{tr(S^TSS^T\partial S)+tr(SS^TS\partial S^T)+tr(S^TSS^T\partial S)+tr(SS^TS\partial S^T)}{\partial S} ) \\ &= \tfrac{1}{2}(2 AS+ 2 A^TS+SS^TS+SS^TS+SS^TS+SS^TS) \\ &=\tfrac{1}{2}( 2 (A+A^T)S+ 4 SS^TS) \\ &= (A+A^T)S+ 2 SS^TS \\ &= (A+A^T+ 2 SS^T)S \\ \end{aligned} \end{equation}

Alt
  • 2,592