2

We have $n \times n$ matrix $L$, given by the following Gaussian kernel $$L_{i,j} = \exp\left(-\frac{(x_i-x_j)^2} {2\sigma ^2} \right)$$

where the points $x_i$ and $x_j$ are real numbers that can be thought as positions of points $i$ and $j$. ( $L$ can be seen as a covariance matrix, describing covariance depending on distance between points. The higher distance between points $i$ and $j$ the less the covariance.)

I am interested in finding $$ \frac{\partial\det(L)}{\partial x_i}$$

From Matrix Book, can I use Jacobi's formula? $$\frac{\partial\det(L)}{\partial x}= \det (L) \operatorname{Tr}\left( L^{-1} \frac{\partial L}{\partial x}\right).$$

Is it correct that $\frac{\partial\det(L)}{\partial x_i} = \det (L) \operatorname{Tr}\left( L^{-1} \frac{\partial L}{\partial x_i}\right)$

where $\frac{\partial L}{\partial x_i}$ is the matrix given by the terms $\frac{\partial L_{k,l}}{\partial x_i}=0$ for $k,l \neq i$

and $\frac{\partial L_{i,j}}{\partial x_i}=\frac{\partial L_{j,i}}{\partial x_i}=-\frac{(x_i-x_j)}{ \sigma ^2} L_{i,j}$?

1 Answers1

1

Collect the $x_i$ vectors into a single matrix $$X = \big[\matrix{x_1&x_2\ldots&x_n}\big]$$ Following this post, define the Gram matrix and extract its main diagonal into a vector $$G=X^TX,\quad g=\operatorname{diag}(G)$$ Take the log of $L$ and differentiate $$\eqalign{ L &= \exp\left(\frac{2G -g{\tt1}^T -{\tt1}g^T}{2\sigma^2}\right) \\ 2\sigma^2\log(L) &= 2G -g{\tt1}^T -{\tt1}g^T \\ 2\sigma^2\left(\frac{dL}{L}\right) &= 2\,dG -dg\,{\tt1}^T -{\tt1}\,dg^T \\ dL &= \frac{1}{2\sigma^2}L\odot(2\,dG - dg\,{\tt1}^T - {\tt1}\,dg^T) \\ }$$ For later convenience, define the variables $$\eqalign{ \alpha &= \left(\frac{\phi}{2\sigma^2}\right)\\ R &= L\odot L^{-T} \quad &\big({\rm Hadamard\,Product}\big)\\ P &= \Big(\operatorname{Diag}(R{\tt1})-R\Big) \quad &\big({\rm Laplacian\,of\,}R\big) \\ }$$ Start with the formula for the derivative of the determinant and substitute $dL$ from above. $$\eqalign{ \phi &= \det(L) \\ d\phi &= \phi\,L^{-T}:dL \\ &= \phi\,L^{-T}:(2\,dG - dg\,{\tt1}^T - {\tt1}\,dg^T)\odot\frac{L}{2\sigma^2} \\ &=\alpha R:\left(2\,dG -dg\,{\tt1}^T -{\tt1}\,dg^T\right) \\ &= 2\alpha R:dG - 2R{\tt1}:dg \\ &= 2\alpha R:dG - 2\alpha R{\tt1}:\operatorname{diag}(dG) \\ &= 2\alpha \Big(R -\operatorname{Diag}(R{\tt1})\Big):dG \\ &= -2\alpha P:dG \\ &= -2\alpha P:(X^TdX+dX^TX) \\ &= -4\alpha P:X^TdX \\ &= -4\alpha XP:dX \\ \frac{\partial \phi}{\partial X} &= -4\alpha XP \\ &= -\left(\frac{2\phi}{\sigma^2}\right)XP \\ }$$ So that's the formula for the gradient wrt the $X$ matrix.
To find the gradient with respect one of its columns, multiply by the standard basis vector $e_i$ $$\eqalign{ x_i &= Xe_i \\ \frac{\partial \phi}{\partial x_i} &= \left(\frac{\partial \phi}{\partial X}\right)e_i \;=\; -\left(\frac{2\phi}{\sigma^2}\right)XPe_i \\ }$$ NB: In several steps, a colon was used as a product notation for the trace operator, i.e. $$\eqalign{ A:B &= \operatorname{Tr}(A^TB) }$$ Use was also made of the fact that $\{G,L,M,P\}$ are all symmetric matrices.

The matrix $R=\big(L\odot L^{-T}\big)$ is known as the relative gain array and has some interesting uses in control theory. One of its properties is $R{\tt1}={\tt1}$, which allows some terms to be simplified. $$\eqalign{ \operatorname{Diag}(R{\tt1}) &= I \\ P &= \big(I-R\big) \\ }$$

greg
  • 35,825
  • thanks a lot. However, I think you made an error here, it does not seem correct that $ -4P:X^TdX = -4XP:dX $ – matrix777 Mar 21 '20 at 09:40
  • $x_i$ is not a vector, but a scalar – matrix777 Mar 21 '20 at 10:42
  • If $x_i$ is a scalar that just means that $X$ is a (row) vector. Also, the cyclic property of the trace allows the terms of a colon product to be re-arranged in exactly the way you've highlighted. – greg Mar 21 '20 at 13:42
  • thanks Greg, my question then why not using Jacobi formula, I see that it applies directly to my case (see also the description where I suggested the solution). Here Jacobi formula [link] (https://mathoverflow.net/questions/214908/proof-for-the-derivative-of-the-determinant-of-a-matrix). I think the result will be the same using your derivation or using Jacobi. – matrix777 Mar 21 '20 at 22:36
  • I used the Jacobi formula in the very first step, i.e. $$d\phi = \phi L^{-T}:dL$$ – greg Mar 21 '20 at 22:53
  • Thanks, but I can not see why the trace did disappear from your end result. I mean how from a trace $\partial \phi = -4\alpha XP:dX $ we arrived in the next line to a product of matrices $\frac{\partial \phi}{\partial X}= -4\alpha XP $. The Jacobi formula has the trace, $$\frac{\partial\det(L)}{\partial x}= \det (L) \operatorname{Tr}\left( L^{-1} \frac{\partial L}{\partial x}\right).$$. – matrix777 Mar 22 '20 at 01:00
  • The Jacobi formula, in the form you've written it, only works when $x$ is a scalar. If $x$ is a vector or matrix, then the quantity inside the trace function becomes a 3rd or 4th order tensor. How would you define the trace of a tensor? – greg Mar 24 '20 at 14:57