1

Let $F(A)$ be a matrix-valued function, operating on real-valued matrix $A \in \mathbb{R}^{m, n}$ that applies a scalar function $f(\lambda)$ on the singular values of $A$. That is, suppose $A$ has the following singular value decomposition: $$ A = U \Sigma V^\top, $$ with $U, V$ being orthogonal and $\Sigma$ being diagonal matrices, then $$ B = F(A) = U F(\Sigma) V^\top, $$ where $F(\Sigma)$ is computed by applying $f$ entry-wise on the diagonal elements of $\Sigma$. Let $g$ be a scalar-valued function that depends on the matrix $B$.

Question: How do we find $\dfrac{\partial g(B)}{\partial A}$? In this question, $\dfrac{\partial g(B)}{\partial A} \in \mathbb{R}^{m,n}$ is a matrix whose $(i,j)-$entry contains the value $\dfrac{\partial g(B)}{\partial A_{i,j}}$. Also, I'm looking for (if there is any) a closed-form expression for this, and not just a procedure to compute the partial derivatives.

Steve
  • 13

2 Answers2

1

To go one step further the excellent answer from Greg, one can remark that $\lambda_k$ can be simplfied a bit.

\begin{eqnarray} \lambda_k &=& \mathbf{g}^T \mathbf{K} \mathbf{Q} \mathbf{e}_k \\ &=& q_k \mathbf{g}^T \mathbf{K} \mathbf{e}_k \\ &=& q_k \mathbf{g}^T \mathrm{vec}(\mathbf{u}_k \mathbf{v}_k^T) \\ &=& \mathbf{G}: q_k\mathbf{u}_k \mathbf{v}_k^T \end{eqnarray}

Steph
  • 3,665
0

$ \def\bbR#1{{\mathbb R}^{#1}} \def\b{\beta}\def\g{\gamma} \def\s{\sigma}\def\S{\Sigma}\def\e{\varepsilon} \def\l{\lambda}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\rank#1{\operatorname{rank}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\bx{\boxtimes} $Assume that the SVD decomposition has distinct singular values $\{\s_k\}$ $$\eqalign{ A &= USV^T = \sum_{k=1}^r \s_k u_kv_k^T \\ A &\in\bbR{m\times n} \qquad U \in\bbR{m\times r},\; S \in\bbR{r\times r},\; V \in\bbR{n\times r} \\ r &= \rank{A} \\ }$$Let's rename the function $(f,g)\to(\b,\g),\,$ so that we can write the mnemonic equations $$\eqalign{ B &= \b(A) = U\b(S)V^T \quad &\{{\rm matrix\;function}\} \\ \g &= \g(B) \quad &\{{\rm scalar\;function}\} \\ }$$ and for typing convenience, define the variables $$\eqalign{ s &= \diag S \quad &\{{\rm vector\;of\;singular\;values}\} \\ p &= \b(s) \qquad &\{{\rm function\;applied\;elementwise}\} \\ q &= \b'(s) \qquad &\{{\rm derivative\;applied\;elementwise}\} \\ P &= \b(S) \,= \Diag p \;& \\ Q &= \b'(S)\!= \Diag q \\ \\ u_k &= U\e_k \\ v_k &= V\e_k \quad &\{\e_k\,{\rm are\;the\;standard\;basis\;vectors}\} \\ G &= \grad{\g}{B} \quad &\{{\rm gradient\;of\;}\g\;{\rm is\;\c{known}}\} \\ g &= \vecc G \\ b &= \vecc B \\ K &= {V\bx U} \quad &\{{\rm Khatri-Rao\;product}\} \\ \l_k &= g^TKQ\e_k \\ }$$ Use the column-wise Khatri-Rao product to expand $\vecc B$ and calculate its differential. $$\eqalign{ B &= U\,\Diag{p}\;V^T \\ b &= Kp \\ db &= K\,\c{dp} \\ &= K\c{Q\,ds} \\ }$$ Substitute this into the differential of $\g$ $$\eqalign{ d\g &= G:dB \\ &= g^T\c{db} \\ &= g^T\c{KQ\,ds} \\ }$$ This post provides a formula for the gradient of the singular values
$$\eqalign{ d\s_k &= u_k v_k^T:dA \\ s &= \sum_{k=1}^r \e_k\star \s_k \\ ds &= \sum_{k=1}^r \e_k\star d\s_k \;=\; \LR{\sum_{k=1}^r \e_k\star u_k v_k^T}:dA \\ }$$ which yields the desired gradient $$\eqalign{ d\g &= g^TKQ\,ds \\ &= \LR{\sum_{k=1}^r\CLR{g^TKQ\e_k}\LR{u_k v_k^T}}:dA \\ &= \LR{\sum_{k=1}^r\c{\l_k} u_k v_k^T}:dA \\ \grad{\g}{A} &= \sum_{k=1}^r {\l_k u_k v_k^T} \;=\; ULV^T \\ }$$ where $L$ is a matrix whose diagonal elements are the $\l_k$ values.

greg
  • 35,825
  • Thanks for the detailed response. May I clarify when you wrote $d\gamma = G : dB$, what does the colon mean? I'm not very familiar with differentials so I would appreciate it if you could point me to some references that explain the notations. – Steve Apr 04 '22 at 09:41
  • @LamChiThanh The colon denotes the Frobenius inner product for matrices $$A:B;=;{\rm trace}!\left(A^TB\right)$$ In engineering disciplines it is often called the double-dot product. – greg Apr 04 '22 at 14:04
  • As for reference material, the standard text is probably Magnus and Neudecker's Matrix Differential Calculus although personally I prefer Hjorungnes's Complex-Valued Matrix Derivatives – greg Apr 04 '22 at 14:41
  • Thank you so much! Really appreciate your help. – Steve Apr 04 '22 at 14:45
  • Just one more clarifying question, what is the $\star$ in $\epsilon_k \star u_k v_k^\top$? Does that refer to matrix multiplication? – Steve Apr 04 '22 at 15:17
  • @LamChiThanh That is my own non-standard notation for a dyadic product. Older engineering texts often use the notation $(a\otimes b)$, but these days that symbol is reserved for the Kronecker product. In continuum mechanics, simple juxtaposition is used $($i.e. $,ab)$, but that notation is too easily confused with the standard matrix product. So the quantity $(\varepsilon\star uv^T = \varepsilon\star u\star v)$ actually represents a triadic or third-order tensor. – greg Apr 04 '22 at 15:57