I'm wondering how I can minimize this function with respect to $x$ (not $x_0$). This isn't for homework - I saw them give the answer in the book but they didn't explain how they did it and I'm wondering how. $$ f(x_0 ) + (x - x_0 )^T \nabla f(x_0 ) + \frac{1}{2}(x - x_0 )^T H(x - x_0 ) $$ Here's the answer in the book. $$ x^ * = x_0 - \nabla f(x_0 )H^{ - 1} $$ I was able to get to the same answer but I'm pretty sure I didn't do it using the right rules, so if someone could explain how to derive the formula, especially the tricky last term, would be greatly appreciated.
My main question surrounds the fact that there is a smaller function $x - x_0$ inside the quadratic term at the end. I know how to derive this if $x- x_0$ was just a vector, where the derivative w.r.t that vector would be ($H + H^T$) $\cdot$ vector. I got this from (here) as well as figuring it out myself.
Thanks!