0

$A\in\mathbb{R}^{M\times N}, b\in\mathbb{R}^{M\times 1},$ and $\Lambda\triangleq\text{diag}(\lambda),$ where $\lambda\triangleq[\lambda_1,\ldots,\lambda_M]$.

I would need to compute the following derivative: $$\frac{d}{d\lambda}[(A^\text{T}A+\Lambda)^{-1}A^\text{T}B].$$

I worked on it and I got this, $\forall i=1,\ldots,M$:$$-\frac{\partial}{\partial\lambda_i}[(A^\text{T}A+\Lambda)^{-1}A^\text{T}B]=$$$$(A^\text{T}A+\Lambda)^{-1}E_i(A^\text{T}A+\Lambda)^{-1}A^{T}B,$$ where $E_i$ is a matrix with all $0$'s except the $i$th diagonal element which is equal to 1.

Is this correct? I feel like something is missing or incorrect.

  • It seems correct to me if $A$ and $B$ are independent of $\lambda_i$. Why do you think it is incorrect? And if you provide the steps of your work someone can pinpoint the error if there is one. – obareey Mar 07 '22 at 08:25
  • Definitely correct, see that related post: https://math.stackexchange.com/questions/1471825/derivative-of-the-inverse-of-a-matrix – KBS Apr 17 '22 at 10:59

1 Answers1

1

$ \def\e{\varepsilon}\def\l{\lambda}\def\L{\Lambda} \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\vecc#1{\operatorname{vec}\LR{#1}} \def\diag#1{\operatorname{diag}\LR{#1}} \def\Diag#1{\operatorname{Diag}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $It's difficult to visually distinguish $\L$ from $A$, so I'll rename $\,\L\to L.$

The gradient of a vector with respect to one of its components is $$\eqalign{ \grad{\l}{\l_k} &= \e_k }$$ where $\e_k$ is the $k^{th}$ standard basis vector.

Use this to calculate $$\eqalign{ \grad{L}{\l_k} = \grad{\Diag\l}{\l_k} = \Diag{\e_k} \\\\ }$$

Define the following matrix variables $$\eqalign{ E_k &= \Diag{\e_k} \\ M &= \LR{A^TA+L} &\qiq dM &= dL \\ W &= M^{-1} &\qiq dW &= -W\,dL\,W \\ }$$ and substitute them into the function of interest $$\eqalign{ F &= WA^TB \\ dF &= dW\,A^TB \\ &= -W\,dL\,WA^TB \\ &= -W\,dL\,F \\ \grad{F}{\l_k} &= -W\gradLR{L}{\l_k}F \\ &= -WE_kF \\ }$$ which confirms your own result.

greg
  • 35,825