3

I am trying to derive Newton step in an iterative optimization. I know the step is:

$$\Delta x=-H^{-1}g$$

where H is Hessian and $g$ is gradient of a vector function $f(x)$ at $x$.

I also know the step is derived from second order Taylor expansion.

For example, let's have a vector fuction $f(x)$ which is $\mathbb{R}^{3}\to\mathbb{R}$. Its second-order Taylor expansion is:

$$f(x+\Delta x)\approx f(x)+g^{T}\Delta x+ \frac{1}{2}(\Delta x)^{T}H(\Delta x)$$

where $g$ is gradient and $H$ is Hessian of $f$ at $x$.

Taking partial derivative with respect to $\Delta x$ should be:

$$\frac{\partial f(x+\Delta x)}{\partial \Delta x}=g^{T}+H(\Delta x)$$

The first term is clear since $g^{T}$ is multiplier of $\Delta x$. But why

$$\frac{\partial}{\partial\Delta x} \frac{1}{2}(\Delta x)^{T}H(\Delta x)=H(\Delta x)$$ ?

Using product rule leads me to a different result (ommiting the $\frac{1}{2}$ for more clarity):

$$\frac{\partial}{\partial\Delta x} (\Delta x)^{T}H(\Delta x)= \\ =\frac{\partial(\Delta x)^{T}}{\partial\Delta x}H(\Delta x)+(\Delta x)^{T}\frac{\partial H}{\partial\Delta x}(\Delta x)+(\Delta x)^{T}H\frac{\partial\Delta x}{\Delta x}= \\ =1\cdot H\cdot\Delta x + 0 + (\Delta x)^{T}H\cdot 1= \\ =H\cdot\Delta x + (\Delta x)^{T}H$$

Libor
  • 1,307
  • Something has to be very wrong, since the matrices you are adding at the end don't even have the same dimension! I blame your derivative of the transpose. –  Nov 07 '13 at 12:12
  • $g^{T}\Delta x$ is a $3\times 3$ matrix so is the second term of the Taylor polynomial. This is copied from a book, but the differentiation was missing and I don't know how they arrive at the results. – Libor Nov 07 '13 at 12:19
  • And $g^{T}+H(\Delta x)$ is a 3-vector since $H(\Delta x)$ is a matrix multiplied by a vector. – Libor Nov 07 '13 at 12:20
  • Here's a relevant thread. I recommend Did's answer there. Also, I show how to derive this result using the product rule. – littleO Nov 08 '13 at 01:16

4 Answers4

2

I think you are having difficulties in comprehension of how the gradient a function and its Hessian used in the Taylor expansion. Let $p=(x,y,z)$ and $\Delta p=(\Delta x,\Delta y, \Delta z)$. The Taylor expansion of second order is \begin{align} f(p+\Delta p) =& f(p)+\nabla f(p)\cdot(\Delta p)^{\;T}+\frac{1}{2}(\Delta p)\cdot\mathbf{H}f(p)\cdot(\Delta p)^{\;T}+r(\Delta p), \quad \lim_{\Delta p\to 0}\;\frac{r(\Delta p)}{\|\Delta p\|}=0. \end{align}

Remember, the gradient $\nabla f(p)$ of a function $f:\mathbb{R}^3\to \mathbb{R}$ in $p=(x,y,z)$ is $$ \nabla f(x,y,z)=\left(\dfrac{\partial f(x,y,z)}{\partial x},\dfrac{\partial f(x,y,z)}{\partial y}, \dfrac{\partial f(x,y,z)}{\partial z} \right)\\ $$ and a total derivative of first order is \begin{align} \nabla f( p)\bullet (\Delta p) = & \left(\dfrac{\partial f(x,y,z)}{\partial x},\dfrac{\partial f(x,y,z)}{\partial y}, \dfrac{\partial f(x,y,z)}{\partial z} \right)\bullet (\Delta x,\Delta y, \Delta z) \\ = & \dfrac{\partial f(x,y,z)}{\partial x}\cdot \Delta x +\dfrac{\partial f(x,y,z)}{\partial y}\cdot \Delta y+ \dfrac{\partial f(x,y,z)}{\partial z}\cdot\Delta z\\ \end{align} Recall that the inner product $\bullet$ of vectors $\nabla f(x,y,z)$ and $(\Delta x,\Delta y, \Delta z)$ can be expressed as the product of line matrix and column matrix $f(p)\cdot\vec{v}^{\;T}$. The product line matrix by a column matrix is a $1\times 1$ matrix which can be seen as a number. Here $\cdot$ is the product of matrix. $$ Hf(x,y,z)= \left[ \begin{array} . \dfrac{\partial^2 f(x,y,z)}{\partial x\partial x} & \dfrac{\partial^2 f(x,y,z)}{\partial x\partial y} & \dfrac{\partial^2 f(x,y,z)}{\partial x\partial z} \\ \dfrac{\partial^2 f(x,y,z)}{\partial y\partial x} & \dfrac{\partial^2 f(x,y,z)}{\partial y\partial y} & \dfrac{\partial^2 f(x,y,z)}{\partial y\partial z} \\ \dfrac{\partial^2 f(x,y,z)}{\partial z\partial x} & \dfrac{\partial^2 f(x,y,z)}{\partial z\partial y} & \dfrac{\partial^2 f(x,y,z)}{\partial z\partial z} \\ \end{array} \right]. $$ The total derivative of the second order is then given by \begin{align} \Delta p\cdot\mathbf{H}f(p)\cdot(\Delta p)^{\;T}= & (\Delta x,\Delta y, \Delta z) \mathbf{H} f(x,y,z) \left(\begin{array}{ccc}\Delta x\\\Delta y\\ \Delta z\end{array}\right) \\ \end{align}

Elias Costa
  • 14,658
2

Why not work in small but certain steps? Simply start without the vector notation: $$ f(x+\Delta x,y+\Delta y,z + \Delta z) = f(x,y,z) + \Delta x \frac{\partial f}{\partial x} + \Delta y \frac{\partial f}{\partial y} + \Delta z \frac{\partial f}{\partial z}\\ + \frac{1}{2}(\Delta x)^2 \frac{\partial^2 f}{\partial x^2} + \frac{1}{2}(\Delta y)^2 \frac{\partial^2 f}{\partial y^2} + \frac{1}{2}(\Delta z)^2 \frac{\partial^2 f}{\partial z^2}\\ + (\Delta x)(\Delta y) \frac{\partial^2 f}{\partial x \partial y} + (\Delta y)(\Delta z) \frac{\partial^2 f}{\partial y \partial z} + (\Delta z)(\Delta x) \frac{\partial^2 f}{\partial z \partial x} $$ Then convert this to the vector notation and check out if you can reproduce the above result: $$ f(x+\Delta x,y+\Delta y),z + \Delta z) = f(x,y,z) + \left[ \begin{array}{ccc} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} & \frac{\partial f}{\partial z} \end{array} \right] \left[ \begin{array}{c} \Delta x \\ \Delta y \\ \Delta z \end{array} \right] + \frac{1}{2} \left[ \begin{array}{ccc} \Delta x & \Delta y & \Delta z \end{array} \right] \left[ \begin{array}{ccc} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial x \partial z} \\ \frac{\partial^2 f}{\partial y \partial x} & \frac{\partial^2 f}{\partial y^2} & \frac{\partial^2 f}{\partial y \partial z} \\ \frac{\partial^2 f}{\partial z \partial x} & \frac{\partial^2 f}{\partial z \partial y} & \frac{\partial^2 f}{\partial z^2} \end{array}\right] \left[ \begin{array}{c} \Delta x \\ \Delta y \\ \Delta z \end{array} \right] $$

Han de Bruijn
  • 17,070
  • Thanks for clarifying the expansion, yet I need to know how to compute derivative of the second-order term with respect to $[ \Delta x ; \Delta y ; \Delta z]$... – Libor Nov 07 '13 at 16:26
1

For one thing I would say $H\Delta x$ should be $(\Delta x)^TH$. In matrix calculus, the derivative of a scalar with respect to a column vector is a row vector. So the product rule of $(x^THx)'$ is $$(x^THx)'=(Hx)^T+x^TH=x^T(H+H^T)$$


As for how to derive Newton's iterative formula, note that the function attains its extreme if its first order derivative vanishes. Or $$\nabla^Tf(x)+(\Delta x)^TH=0\\ \Delta x=-H^{-1}\nabla f$$

Shuchang
  • 9,800
  • I have expanded the question - I am trying to derive Newton iteration step (formula for $\Delta x$) derived from second-order Taylor expansion. I can't find appropriate answer on how to get to that formula... – Libor Nov 07 '13 at 16:39
  • @Libor I've expanded my answer. – Shuchang Dec 03 '13 at 03:45
0

Try using indices, that should clarify it. So your equation is

$$ f(x+\Delta x)=f(x)+g_i (\Delta x)_i + \frac{1}{2} (\Delta x)_i H_{ij} (\Delta x)_j $$

where we sum over reoccuring indices. Hence, using $\frac{\partial (\Delta x)_i}{\partial (\Delta x)_k}=\delta_{i,k}$, where $\delta_{i,k}$ is the Kronecker-Delta, we find

$$ \frac{\partial f(x+\Delta x)}{\partial (\Delta x)_k}=g_k + \frac{1}{2} H_{kj} (\Delta x)_j + \frac{1}{2} (\Delta x)_i H_{ik} $$

Using $H_{ik}=H_{ki}$ and renaming the summation index $i$ to $j$, we find the correct result.