0

I have a cost function which I want to differentiate with respect to a scalar,

$$\frac{d}{d\epsilon}J(\epsilon)=\frac{d}{d\epsilon} \|\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\|^2, $$

where $\Delta z$ is a vector, $H$ is a matrix and $\epsilon$ is a scalar. Does anyone know how to do this?

Ale
  • 1
  • Chain rule and search for the derivative of the inverse. It was asked quit often. – user251257 Mar 03 '16 at 23:46
  • The answer here to a related question has a nice short proof for the formula of the derivative. http://math.stackexchange.com/a/297649/251257 – user251257 Mar 03 '16 at 23:55
  • I don't think it's clear how to apply chain rule here. So, it will be 2 times the absolute value term, times the derivative of the term where epsilon is included. But is not the last term then a vector? – Ale Mar 04 '16 at 02:23

1 Answers1

0

I'm not sure what tools you are aware of, so this is probably beyond them. But differentiation of vectors and matrices is possible. And the norm-squared is just the inner product with respect to itself. The chain-rule and product formula both have analogs in this situation, so:

$$\frac{d}{d\epsilon}J(\epsilon)=\frac{d}{d\epsilon} \left\langle\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z, \Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right\rangle\\ =\left\langle\frac{d}{d\epsilon}\left(\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right), \Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right\rangle\\ +\left\langle\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z, \frac{d}{d\epsilon}\left(\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right)\right\rangle\\= 2\left\langle\frac{d}{d\epsilon}\left(\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right), \Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right\rangle$$

Now assuming that the only dependence on $\epsilon$ here is $\epsilon$ itself (all other values are constant with respect to it), $$\frac{d}{d\epsilon}\left(\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right)\\=-H\frac{d}{d\epsilon}\left((H^TH+\frac{1}{\epsilon}I)^{-1}\right)H^T\Delta z$$

The rule for taking the derivative of an inverse matrix is more complicated than the real version: $\frac d{dt}A^{-1} = -A^{-1}\frac {dA}{dt}A^{-1}$. Fortunately in this case we can do some simplification: $$\frac{d}{d\epsilon}\left((H^TH+\frac{1}{\epsilon}I)^{-1}\right) \\=-(H^TH+\frac{1}{\epsilon}I)^{-1} \left(\frac{d}{d\epsilon}(H^TH+\frac{1}{\epsilon}I)\right)(H^TH+\frac{1}{\epsilon}I)^{-1}\\=-(H^TH+\frac{1}{\epsilon}I)^{-1}\left(\frac{-1}{\epsilon^2}I\right)(H^TH+\frac{1}{\epsilon}I)^{-1}\\=\frac 1{\epsilon^2}(H^TH+\frac{1}{\epsilon}I)^{-2}\\=(\epsilon H^TH+I)^{-2}$$

So: $$\frac{d}{d\epsilon}\left(\Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right)\\=-H(\epsilon H^TH+I)^{-2}H^T\Delta z$$

And:$$\frac{d}{d\epsilon}J(\epsilon) =2\left\langle -H(\epsilon H^TH+I)^{-2}H^T\Delta z, \Delta z -H(H^TH+\frac{1}{\epsilon}I)^{-1}H^T\Delta z\right\rangle$$

Paul Sinclair
  • 43,643