Gradient of QZ decomposition

Question

Let $A$ and $B$ be an $m \times n$ matrix of rank $ k_1 \le \min(m,n) $ and $ k_2 \le \min(m,n) $. Then the QZ decomposition or the generalized Schur decomposition is $A = USV^T$ and $B = UTV^T $, where:

$U$ and $V$ are unitary matrices.
$S$ and $T$ are upper triangular matrices.

We define a function $ f$ which takes $U$, $S$, $T$ and $V$ as input and returns the sum of all the elements of all the matrices. I am interested in finding the gradient of $f$ with respect to $A$ and $B$.

I found the below tricks to be useful by reading some posts here and some papers, but couldn't solve them completely. \begin{align*} dA &= dUSV^{T} + UdSV^{T} + USdV^{T} \newline U^{T}dAV &= U^{T}dUS + dS + SdV^{T}V \newline U^{T}dAVS^{-1} &= U^{T}dU + dSS^{-1} + SdV^{T}VS^{-1} \newline \end{align*} Since $U^{T}dU$ and $dV^{T}V$ are skew symmetric, we get \begin{align*} U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1} = -(U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1})^T \end{align*} which simplifies to \begin{align*} &U^{T}dAVS^{-1} - SdV^{T}VS^{-1} + (U^{T}dAVS^{-1} - SdV^{T}VS^{-1})^T = dSS^{-1} + (dSS^{-1} )^T \newline & sym(C) = sym(dS^{-1}) \end{align*} where $C = U^{T}dAVS^{-1} - SdV^{T}VS^{-1}$. Since, $dSS^{-1}$ is upper triangular, we can write $dS$ as $dS = (sym(C) \circ E^T)S$ where $\circ$ is the Hadamard product and $E$ is- $$ e_{ij}= \begin{cases} 0,& i < j\\ 1, & i=j\\ 2,& i > j \end{cases} $$

Let derivate of output with respect to $A$, $B$, $U$, $V$, $S$ and $T$ be $\bar{A}$, $\bar{B}$, $\bar{U}$, $\bar{V}$, $\bar{S}$ and $\bar{T}$. Then $$Tr(\bar{A}dA) = Tr(\bar{U}dU) + Tr(\bar{S}dS) + Tr(\bar{V}dV) $$ If I substitute $dS$ in the above formula, then I will have RHS in terms of $dV$ and $dA$. I am trying to find a way to eliminate $dV$ so that RHS is just in terms of $dA$ and then I can equate and find $\bar{A}$.

In the above equation $U^{T}dU$ and $dV^{T}V$ are skew symmetric and $dS$ is upper triangular
I am trying to eliminate $dV$ or $dU$ so that $dS$ and $dT$ depends on $dA$ and $dB$ and then I can find $dU$ and $dV$.
Any possible way to simplify $C$?

I am not sure how to jointly solve for $dU$ and $dV$ such that the answer uses $dA$ and $dB$.

Some references-

Gradient of $A \mapsto \sigma_i (A)$

https://arxiv.org/pdf/1509.07838.pdf

https://j-towns.github.io/papers/qr-derivative.pdf

Edit 1: The main purpose of the question is to find gradients for backpropagation through QZ decomposition.

Given a factorization, e.g. $A=XYZ,,$ it is reasonable to calculate a quantity such as $\def\p{\partial}\def\g#1#2{\frac{\p #1}{\p #2}} \g AX,:$ but not $\g XA,!:$ The reason is that factorizations, even well established ones like the SVD or Eigenvalue Decomposition, are not unique. Simply reordering the elements along the diagonal will change all of the factors. Even prescribing the ordering as high-to-low (or low-to-high), is not enough since duplicate eigenvalues can occur. — lynn, Sep 19 '22 at 00:24
Those quantities are well established for SVD and Eigenvalue Decomposition, so I am not sure why it is not possible for QZ. — newbie, Sep 19 '22 at 14:35

greg · Answer 1 · 2022-09-22T17:50:24.907

$ \def\l{\lambda}\def\s{\sigma}\def\e{\varepsilon} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\deriv#1#2{\frac{d #1}{d #2}} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $Here is a partial answer, assuming that the change in $A$ is in the direction of $B$ so that $$\eqalign{ dA &= (B-A)\:d\l \qiq \c{dU = dV = 0} \\ }$$ Let $J_A$ denote an all-ones matrix the same size as $A$, then the cost function is $$\eqalign{ \phi &= J_S:S + J_T:T + J_U:U + J_V:V \\ }$$ Differentiating wrt to $A$ simplifies the problem since $\:dB=0\implies\c{dT=0}$ $$\eqalign{ d\phi &= J_S:dS + J_T:\c{dT} + J_U:\c{dU} + J_V:\c{dV} \\ &= J_S:dS \\ &= J_S:(U^TdA\,V) \\ &= (UJ_SV^T):dA \\ &= (UJ_SV^T):(B-A)\,d\l \\ \grad{\phi}{A} &= UJ_SV^T \\ \deriv{\phi}{\l} &= (UJ_SV^T):(B-A) \\ }$$

In the above, a colon is used to denote the Frobenius product, which is a concise notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \|A\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in such a product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:\LR{AB} &= \LR{CB^T}:A \\&= \LR{A^TC}:B \\ }$$

The assumption that the change in $A$ is in the direction of $B$ seems too narrow. What is the reasoning behind it? — newbie, Sep 14 '22 at 00:17

Gradient of QZ decomposition

1 Answers1