The problem:
Let $S \in \mathbb{C}^{N\times M}$ with $N > M$ and $S^{H}S=\mathbb{I}$, let $\rho$ and $\sigma$ be hermitian matrices of trace $1$ and define the function $D: \mathbb{C}^{N\times M} \rightarrow \mathbb{R}$ as:
$$D(S) = \text{tr}(|S\rho S^{H} - \sigma|),$$
with $|A-B| = (A-B)(A-B)^{H}$ and $^H$ denoting the hermitian transpose, i.e., $D$ is the trace distance. My goal is to compute $\nabla_S D(S)$, the gradient of $D$ w.r.t $S$.
My approach:
I defined the following variables:
$$A = S\rho S - \sigma$$ $$B = A^H A.$$
$D$ then becomes:
$$D = tr(B^{1/2})$$
The goal is now to take the differential of $D$ and rearrange terms to eventually arrive at something like:
$$dD = \text{tr} (K dS),$$
with the transpose of $K$, $K^T$, being the gradient we're looking for.
My progress so far:
$$dD = d(\text{tr}(B^{1/2}) = \text{tr}(d(B^{1/2}))$$ $$dD = \frac{1}{2}\text{tr}((B^{-1/2})^T dB)$$
We have:
$$dB = (dA)^HA + A^HdA$$
And:
$$dA = dS\rho S^H + S\rho (dS)^H$$
I will now get terms with $dS$ and terms with $(dS)^H$ and I'm not sure how to manipulate them to get to an expression from which I can read out the gradient. Is this even the (or a) right approach?