1

Let A be an $n\times n$ matrix (real valued), and set $S := \sqrt{A^t A}$. What is the matrix derivative of $S$ w.r.t. $A$? I.e., what is $\frac{\partial S}{\partial A}$?

(In case of invertibility issues, I might for the time being assume that $A^t A$ is positive definite, although for the applications I have in mind I might have only positive semi-definiteness).

Thanks for help!

  • What is the motivation you have in mind? What was your attempt at solving the problem? – A. Pongrácz Jan 02 '19 at 19:59
  • see https://math.stackexchange.com/questions/1884536/does-the-matrix-square-root-have-directional-derivatives-at-semipositive-points and references therein – rych Jan 20 '19 at 11:23

2 Answers2

1

If you are familiar with the vec operation for matrices, then you could proceed as follows. $$\eqalign{ &S\,S &= A^TA \cr &S\,dS\,(I)+(I)\,dS\,S &= A^T\,dA\,(I)+(I)\,dA^T\,A \cr &(I^T\otimes S+S^T\otimes I)\,{\rm vec}(dS) &= (I^T\otimes A^T)\,{\rm vec}(dA)+(A^T\otimes I)\,{\rm vec}(dA^T) \cr &\Big(I\otimes S+S\otimes I\Big)\,{\rm vec}(dS) &= \Big((I\otimes A^T)+(A^T\otimes I)K\Big)\,{\rm vec}(dA) \cr &\frac{\partial{\,\rm vec}(S)}{\partial{\,\rm vec}(A)} &= \Big(I\otimes S+S\otimes I\Big)^+ \Big((I\otimes A^T)+(A^T\otimes I)K\Big) \cr\cr }$$ where $M^+$ denotes the pseudoinverse of $M$, $I$ is the identity matrix, and $K$ is the commutation matrix associated with the Kronecker product. The solution also takes advantage of the fact that $I$ and $S$ are symmetric.

greg
  • 35,825
  • thanks for your help! – user2019 Jan 04 '19 at 15:09
  • Why can you get away with just needing a pseudoinverse instead of an inverse in the last line? Usually I think of the pseudoinverse when finding $b$ such that $Xb\approx Y$ and then $\hat{b}=(X^{T}X)^{-1}X^{T}Y$. But that's not going to make $Xb$ EQUAL to $Y$ unless $X$ were invertible. – Kashif Sep 08 '20 at 08:33
  • 1
    @Glassjawed Your objection is certainly valid. Every possible solution of the linear differential relationship $$B,dy = A,dx$$ can be written using an arbitrary $p$-vector and the pseudoinverse $$dy = B^+A,dx + (I-B^+B),p$$ The $p$-term is completely independent of the $x$-vector, so to the extent that $\left(\frac{\partial y}{\partial x}\right)$ exists at all, it is given by the matrix coefficient of the $dx$-term. – greg Sep 08 '20 at 19:00
  • That's really neat. Thank you! – Kashif Sep 09 '20 at 08:08
0

Recall that you can perform a polar decomposition on $A$ such that:

$$A = QS$$

where $Q$ is an orthogonal matrix and $S$ is defined uniquely as precisely what you describe above when it exists. Hence, what you really want to calculate is:

$$\frac{\partial S}{\partial A} = \frac{\partial [Q^{-1}A]}{\partial A} = Q^{-1}\otimes\mathbf{I}$$

Note the result is a fourth-order linear transformation, since we are the describing the rate of change of a second-order object $S$ with respect to another second-order linear transformation $A$.

  • thanks for your help! – user2019 Jan 04 '19 at 15:09
  • No problem @user2019! If you find either my answer or greg’s to be correct/sufficient, remember to check it off/upvote it. – aghostinthefigures Jan 04 '19 at 15:12
  • @aghostinthefigures Hey, how did you get = kronecker product of Q inverse and Identity. Is there some underlying method of taking derivative of matrix w.r.t. a matrix that you are using? can you please provide a reference for this rule/method? thank you – user35687 Apr 30 '21 at 22:26
  • @user35687 It's actually a tensor product here (of which the Kronecker product is a flattened representation). Depending on convention, you have the rule $\frac{∂AXB^⊤}{X}= A⊗B$ or $\frac{∂AXB^⊤}{X}= B⊗A$ respectively. The former is the case if you define $(A⊗B){ij, kl} = A{ik}B_{jl}$, since then $AXB^⊤ = ∑{kl} A{ik}B_{jl} X_{kl} ≕ (A⊗B)⋅X$, is a tensor-contraction. – Hyperplane Aug 21 '22 at 14:56