0

I want to see the following step in greater detail:

$$\nabla_x~\bigg[(\vec{x}-\vec{\mu})^TP^{-1}(\vec{x}-\vec{\mu}) \bigg] ~~~~~(3)$$

$$= P^{-1}(\vec{x}-\vec{\mu})$$

Textbook says the above partial derivative was performed by making use of the fact that P is a symmetric matrix and the following derivatives:

$$\frac{\partial}{\partial{\mathbf{x}}}(\mathbf{x}^T \mathbf{a}) = \frac{\partial}{\partial{\mathbf{x}}}(\mathbf{a}^T \mathbf{x}) = \mathbf{a}~~~~~~~(1)$$

$$\frac{\partial}{\partial{\mathbf{x}}} (\mathbf{A}\mathbf{B}) = \frac{\partial A}{\partial{\mathbf{x}}}\mathbf{B} + \mathbf{A} \frac{\partial \mathbf{B}}{\partial{\mathbf{x}}}~~~~~~(2)$$

where a and x are vectors an A and B are matrices.

I'm still not entirely sure I understanding how to perform this differentiation over the above matrix/vector equation (3). would this involve performing rule (2) then rule (1)?

pico
  • 941
  • Equation (2) is not valid when dealing with vectors and matrices, and Equation (3) is missing a factor of two. It'll be hard to prove the derivation until you get the equations right. – greg Sep 27 '19 at 18:53
  • https://youtu.be/i6fqfH5hx60 – pico Sep 27 '19 at 19:00

2 Answers2

2

I don't think the textbook explanation is correct. Equation (2) doesn't apply here, since we're mixing matrices and vectors, and equation (1) doesn't get us far enough.

It may be easier to derive the gradient directly from scratch. For brevity write $u:=x-\mu$, and $A:=P^{-1}$. So $x$ is a vector, and $A$ is a constant square matrix. We seek the gradient of $u^TAu$ with respect to $x$. Notice that $u^TAu$ is a scalar, so its gradient is a vector of the same size as $x$, with component $k$ equal to: $$ \begin{align} \frac\partial{\partial x_k} (u^TAu)&\stackrel{(a)}=\frac\partial{\partial x_k}\sum_i\sum_j u_iA_{ij}u_j\\ &\stackrel{(b)}=\sum_i\sum_j\frac\partial{\partial x_k}(u_iA_{ij}u_j)\\ &\stackrel{(c)}=\sum_i\sum_j\left(\frac{\partial u_i}{\partial x_k}A_{ij}u_j+u_iA_{ij}\frac{\partial u_j}{\partial x_k}\right)\\ &\stackrel{(d)}=\sum_i\sum_j\left(\delta_{ik}A_{ij}u_j+u_iA_{ij}\delta_{jk}\right)\\ &\stackrel{(e)}=\sum_j A_{kj}u_j +\sum_iu_iA_{ik}\\ &\stackrel{(f)}=(Au)_k + (A^Tu)_k\\ &=\left((A+A^T)u\right)_k \end{align} $$ Step (a) is the definition of matrix multiplication; step (b) is linearity of the (univariate) derivative. Step (c) is the (univariate) product rule for derivatives. In step (d) we recognize the derivative is zero except when the subscripts coincide; $\delta_{ij}$ is the Dirac delta. In step (e) we eliminate cases where the Dirac delta is zero. In step (f) we apply the definition of matrix multiplication again; the notation $(Au)_k$ means the $k$th component of the vector $Au$.

Conclude: the $k$th component of the gradient equals the $k$th component of $(A+A^T)u$. Here $A$ is symmetric, so this simplifies to $$\frac\partial{\partial x}u^TAu=2Au$$

grand_chat
  • 38,951
0

This is the derivative you want:

$$\frac{\partial}{\partial x}\bigg(x^TAx \bigg)=x^T(A+A^T) ~~~~~(1)$$

further, If matrix A is symmetric, then $A=A^T$.

Thus, for a symmetric matrix (1) becomes:

$$\frac{\partial}{\partial x}\bigg(x^TAx \bigg)=2x^T A$$

The result of this will be a row vector: $2x^T A$

If you want it as a column vector instead of a row vector, then you just take the transpose of the result:

$$(2x^TA)^T=(2A^Tx)$$

Again since A is symmetric $A^T=A$

$$=(2Ax)$$