Let $A$ and $B$ be an $m \times n$ matrix of rank $ k_1 \le \min(m,n) $ and $ k_2 \le \min(m,n) $. Then the QZ decomposition or the generalized Schur decomposition is $A = USV^T$ and $B = UTV^T $, where:
$U$ and $V$ are unitary matrices.
$S$ and $T$ are upper triangular matrices.
We define a function $ f$ which takes $U$, $S$, $T$ and $V$ as input and returns the sum of all the elements of all the matrices. I am interested in finding the gradient of $f$ with respect to $A$ and $B$.
I found the below tricks to be useful by reading some posts here and some papers, but couldn't solve them completely. \begin{align*} dA &= dUSV^{T} + UdSV^{T} + USdV^{T} \newline U^{T}dAV &= U^{T}dUS + dS + SdV^{T}V \newline U^{T}dAVS^{-1} &= U^{T}dU + dSS^{-1} + SdV^{T}VS^{-1} \newline \end{align*} Since $U^{T}dU$ and $dV^{T}V$ are skew symmetric, we get \begin{align*} U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1} = -(U^{T}dAVS^{-1} - dSS^{-1} - SdV^{T}VS^{-1})^T \end{align*} which simplifies to \begin{align*} &U^{T}dAVS^{-1} - SdV^{T}VS^{-1} + (U^{T}dAVS^{-1} - SdV^{T}VS^{-1})^T = dSS^{-1} + (dSS^{-1} )^T \newline & sym(C) = sym(dS^{-1}) \end{align*} where $C = U^{T}dAVS^{-1} - SdV^{T}VS^{-1}$. Since, $dSS^{-1}$ is upper triangular, we can write $dS$ as $dS = (sym(C) \circ E^T)S$ where $\circ$ is the Hadamard product and $E$ is- $$ e_{ij}= \begin{cases} 0,& i < j\\ 1, & i=j\\ 2,& i > j \end{cases} $$
Let derivate of output with respect to $A$, $B$, $U$, $V$, $S$ and $T$ be $\bar{A}$, $\bar{B}$, $\bar{U}$, $\bar{V}$, $\bar{S}$ and $\bar{T}$. Then $$Tr(\bar{A}dA) = Tr(\bar{U}dU) + Tr(\bar{S}dS) + Tr(\bar{V}dV) $$ If I substitute $dS$ in the above formula, then I will have RHS in terms of $dV$ and $dA$. I am trying to find a way to eliminate $dV$ so that RHS is just in terms of $dA$ and then I can equate and find $\bar{A}$.
- In the above equation $U^{T}dU$ and $dV^{T}V$ are skew symmetric and $dS$ is upper triangular
- I am trying to eliminate $dV$ or $dU$ so that $dS$ and $dT$ depends on $dA$ and $dB$ and then I can find $dU$ and $dV$.
- Any possible way to simplify $C$?
I am not sure how to jointly solve for $dU$ and $dV$ such that the answer uses $dA$ and $dB$.
Some references-
Gradient of $A \mapsto \sigma_i (A)$
https://arxiv.org/pdf/1509.07838.pdf
https://j-towns.github.io/papers/qr-derivative.pdf
Edit 1: The main purpose of the question is to find gradients for backpropagation through QZ decomposition.