1

Given the $d$-vector $\mathbf a$ and the $d \times d$ matrix $\mathbf M$, let $$f (\mathbf{x}) := (\mathbf a-\mathbf x)^\top\mathbf M^{-1}(\mathbf a-\mathbf x)$$ Find the gradient $\nabla_{\mathbf{x}} f (\mathbf{x})$.


What I have obtained was the following. Is it right?

$$\nabla_{\mathbf{x}} f (\mathbf{x}) = (\mathbf M^{-1}+(\mathbf M^{-1})^\top)(\mathbf a - \mathbf x^\top)$$

BTW, what is numerator-layout notation and denominator-layout notation? How to know which to use?

1 Answers1

2

Your answer is incorrect, since you cannot add ${\bf x}^T$ to $\bf{a}$. Instead, consider that:

$$({\bf A x})_{i} =\sum_j A_{ij} x_j $$

So that: $${\bf x}^T{\bf A x} =\sum_{i}x_i\left(\sum_j A_{ij}x_j\right)$$ And: $$\frac{\partial }{\partial x_k}\left({\bf x}^T{\bf A x}\right) = \frac{\partial }{\partial x_k}\left( \sum_{i\neq k}x_i\left(\sum_j A_{ij}x_j\right) + x_k\left(\sum_j A_{kj}x_j\right) \right) $$

$$=\sum_{i\neq k}x_iA_{ik} + \sum_{i\neq k} A_{ki}x_i+2A_{kk} x_k=(({\bf A} + {\bf A^T}){\bf x})_{k}$$ Or: $$\frac{\partial }{\partial x_k}\left({\bf x}^T{\bf A x}\right) =\left({\bf A} + {\bf A^T}\right){\bf x}$$ Thus your in your case, you should get: $$\left({\bf M}^{-1} + {\bf M}^{-T}\right)({\bf x}-{\bf a})$$ Note that we could have equivalently decided to write this as: $$({\bf x}-{\bf a})^T\left({\bf M}^{-1} + {\bf M}^{-T}\right)$$ Which one is correct? they both have the same values, but one is a column vector, and one is a row vector. One could define the derivative of a scalar by a vector as either a row or a column, and this choice is exactly the difference between "Numerator-layout notation" and "Denominator-layout notation". The choice is arbitrary - different authors choose differently, but it is important you choose consistently.