4

Let $\alpha_i\in\mathbb{R}$, $x_i\in\mathbb{R}^d$ for all $i\in[k]$, with $k \geq d$. I am looking for this derivative

$$ \frac{\partial}{\partial\alpha_i} \left(\sum_{i=1}^k\alpha_ix_ix_i^T\right)^{-1} = \frac{\partial}{\partial\alpha_i} \left(X\Lambda X^\top\right)^{-1}, $$

where we define $X: \text{col(X)}= \{x_i\}_{i\in[k]}$, $\Lambda = \text{diag}(\alpha)$, and we assume $\left(X\Lambda X^\top\right)$ is invertible.

  • Yes, defining $G(\alpha) = \left(X\Lambda X^\top\right)$, if $h \neq \alpha_i$, for all $i$, we have that $G(\alpha-h)$ is full rank and hence its inverse exist. But I don't see how to proceed from here to find the derivative. –  Mar 24 '21 at 07:42
  • 2
    The matrix has rank at most $k$ (its image is contained in the span of ${x_i}$), thus for $k<d$ the matrix cannot be invertible. – lisyarus Mar 24 '21 at 08:01
  • You are right! updated the question! –  Mar 24 '21 at 08:02

2 Answers2

4

Let's call $A=X\Lambda X^T$. Now, $$\frac{\partial}{\partial \alpha_i}A^{-1} = -A^{-1} \left( \frac{\partial}{\partial \alpha_i} A \right)A^{-1}$$ (see Derivative of the inverse of a matrix), provided $A^{-1}$ exists. Finally, $$\frac{\partial}{\partial \alpha_i} A= x_i x_i^T$$.

lisyarus
  • 15,517
  • Thanks @lisyarus. Can we also claim that $\frac{\partial}{\partial \alpha}A^{-1} = -A^{-1} XX^TA^{-1}$ –  Mar 24 '21 at 08:11
  • 4
    @temporary_freak I don't think so: the derivative of a matrix with respect to a vector has to be a third-order tensor, not just a matrix. The derivative of a matrix with respect to a scalar is already a matrix. – lisyarus Mar 24 '21 at 08:15
3

Given fat matrix ${\rm V} \in \Bbb R^{d \times n}$, let

$${\rm F} ({\rm x}) := {\rm V} \,\mbox{diag} ({\rm x}) {\rm V}^\top, \qquad {\rm G} ({\rm x}) := \left( {\rm F} ({\rm x}) \right)^{-1}$$

Using Sherman-Morrison,

$$\begin{aligned} {\rm G} ({\rm x} + h \, {\rm e}_i) = \left( {\rm F} ({\rm x} + h \, {\rm e}_i) \right)^{-1} = \left( {\rm F} ({\rm x}) + h \, {\rm v}_i {\rm v}_i^\top \right)^{-1} &= {\rm G} ({\rm x}) - h \,\frac{{\rm G} ({\rm x}) \, {\rm v}_i {\rm v}_i^\top {\rm G} ({\rm x})}{1 + h \, {\rm v}_i^\top {\rm F} ({\rm x}) \,{\rm v}_i} \\ &= {\rm G} ({\rm x}) - h \,{\rm G} ({\rm x}) \, {\rm v}_i {\rm v}_i^\top {\rm G} ({\rm x}) + \mathcal O \left( h^2 \right)\end{aligned}$$

and, thus,

$$\partial_i {\rm G} ({\rm x}) = \color{blue}{- {\rm G} ({\rm x}) \, {\rm v}_i {\rm v}_i^\top {\rm G} ({\rm x})}$$

which is what Господин Лисица obtained.