Why is the gradient of this determinant, $\det(I - AA^\dagger)$, off by factor of $2$?

Question

I have banged my head on this for a couple of hours and I can't find what's wrong:

$$ \begin{align} \frac{\partial}{\partial A_{pq}}\det(\mathbb{1} - AA^\dagger) &= \det(\mathbb{1} - AA^\dagger)\mathrm{tr}\left[(\mathbb{1} - AA^\dagger)^{-1}\frac{\partial(\mathbb{1} - AA^\dagger)}{\partial A_{pq}}\right]\\ &=\det(\mathbb{1} - AA^\dagger)\sum_{mnk}(\mathbb{1} - AA^\dagger)^{-1}_{mn}\frac{\partial(\mathbb{1}_{nm} - A_{nk}A^\dagger_{km})}{\partial A_{pq}}\\ &= \det(\mathbb{1} - AA^\dagger)\sum_{mnk}(\mathbb{1} - AA^\dagger)^{-1}_{mn}\delta_{np}\delta_{kq}(-A^\dagger_{km})\\ &=\det(\mathbb{1} - AA^\dagger)\sum_{m}(\mathbb{1} - AA^\dagger)^{-1}_{mp}(-A^\dagger_{qm})\\ &=-\det(\mathbb{1} - AA^\dagger)(\mathbb{1} - AA^\dagger)^{-T}A^* \end{align} $$

When I try it out numerically though, it's off by a factor 2:

def det(A):
    return np.linalg.det(np.identity(A.shape[0]) - A @ np.conj(A.T))
def ddet(A):
    return - det(A) * np.linalg.inv(np.identity(A.shape[0]) - [email protected](A.T)).T @ np.conj(A)
A1 = np.array([[0.1, 0.2],[0.2, 0.3]]) 
A2 = np.array([[0.1, 0.2],[0.2, 0.3+delta]]) 
delta = 1e-6
np.isclose((det(A2) - det(A1))/delta, 2*ddet(A1)[1,1])
>>> True

When the entries of $A$ are complex it's even more wrong, but let's solve the 2x issue first.

Are you sure you're not forgetting to multiply by 2 from differentiating the quadratic $A_{pq}^2$ for the chain rule? — Frank Seidl, Mar 12 '21 at 22:16
I thought about that, but I'm computing $\frac{\partial}{\partial A}$ independently from $\frac{\partial}{\partial A^*}$ (as per Wirtinger calculus), so the expression in the determinant is linear in $A$. — Ziofil, Mar 12 '21 at 22:18
Does $\dagger$ denote the Hermitian transpose? Is $\mathbb{1}$ the identity matrix or a matrix of $1$s? — Rodrigo de Azevedo, Mar 13 '21 at 04:22

greg · Accepted Answer · 2021-03-14T15:34:09.170

For typing convenience define the variables $$\eqalign{ P &= \Big(I-AA^H\Big) = P^H \quad\implies\quad P^T=P^*\\ \phi &= \det(P)\\ }$$ and denote the trace as a colon product $$A:B = {\rm Tr}(A^TB)$$ Then calculate the Wirtinger differential via Jacobi's formula $$\eqalign{ d\phi &= \phi\;P^{-T}:dP \\ &= -\phi\;P^{-T}:\big(dA\,A^H+A\,dA^H\big) \\ &= -\phi\;P^{-T}A^*:dA -\phi\;A^TP^{-T}:dA^H \\ &= -\phi\;\big(P^{-1}A\big)^*:dA -\phi\;\big(P^{-1}A\big):dA^* \\\\ }$$ When $A$ is real, $\,A^*=A,\,$ $P^*=P\,$ and $$\eqalign{ d\phi &= -2\phi\;\big(P^{-1}A\big):dA \\ \frac{\partial \phi}{\partial A} &= -2\phi\;P^{-1}A \\ }$$ In the general case $$\eqalign{ \frac{\partial \phi}{\partial A} &= -\phi\;\big(P^{-1}A\big)^* \\ \frac{\partial \phi}{\partial A^*} &= -\phi\;P^{-1}A \\\\ }$$ NB: In order to test the complex case, your Python code should estimate the total differential instead of the gradient, i.e. by validating expressions such as the following $$\eqalign{ A &= A_1 \quad\qquad&\implies\quad dA = A_2-A_1 \\ P &= I - AA^H \quad\qquad&\implies\quad dP = -dA\,A^H-A\,dA^H \\ }$$ $$\eqalign{ \frac{-d\phi}{\phi} = \frac{\det(P)-\det(P+dP)}{\det(P)} \quad \overset{?}{=} \quad {\rm Tr}\Big(\big(P^{-1}A\big)^HdA + \big(P^{-1}A\big)^T\,dA^*\Big) \\ }$$ If you insist on using the gradient, then you must use multivariate Wirtinger derivatives in order to code the correct expression.

Why is the gradient of this determinant, $\det(I - AA^\dagger)$, off by factor of $2$?

1 Answers1