2

I have the following problem: given the map $$p \mapsto p^T g + p^T S p$$ where the operation in the first term of the sum is the dot product, and in the second matrix multiplication, I would like to find its critical points.

[The context for the interested is that $g$ and $S $ represents the gradient and the Hessian, respectively, of a twice differentiable map from $\mathbb R^n$ to $\mathbb R$.]

My approach is to differentiate the map w.r.t. $p $. Then, assuming that $S $ is symmetric, I get that for each $p$ the derivative is the map

$$q \mapsto q^Tg + 2 p^T S q + q^T S q$$

Now how can I find a $p $ such that this represents the constant zero function?

Thanks in advance!

EDIT

Here is how I tried to calculate the derivative of the function at a given point $p $

To begin with, my definition of a derivative of a map $m$ from $\mathbb R^n$ to $\mathbb R$ at a point $p$ is a linear map $A$ from $\mathbb R^n $ to $\mathbb R$ such that

$$\lim_{h \to 0 } \frac{|m(p+h) - m(p) - Ah|}{|h|}=0$$

My approach is to differentiate the two maps $p \mapsto p^Tg $ and $p \mapsto p^TS p$ individually and then add those two functions. And we differentiate them by finding a function that "fits" the definition given above.

Since the map $p \mapsto p ^T g$ is linear it is immediate that

$$\lim _{h \to 0 } \frac {|((p+h)^T g - p^T g - h^T g | } {|h | } = \lim_{h \to 0}\frac{0}{|h|} $$

and thus that the map is its own derivative.

For the second term I assume that $S$ is symmetric [a similar derivation would work otherwise]. Then since

$$(p+h)^T S (p+h) = p^T S p + 2 p^T S h + h^T S h$$

we have simarily as above that

$$\lim_{h \to 0 } \frac{|(p+h)^T S (p+h) - p^TS p - (2 p^T S h + h^T S h)| }{|h|}= \lim_{h \to 0 } \frac{0}{|h|}=0$$

and so the mat $q \mapsto p^T S q + q^T S q$ would be a derivative at the point $p$ of the map $q \mapsto q^T S q$.

Then combining those two derivatives we get the map

$$q \mapsto q^Tg + 2 p^T S q + q^T S q$$

is the derivative of $$q \mapsto q^T g + q^T S q$$

at the point $p$.

SECOND EDIT

I found the error I made above. Since the map

$$q \mapsto q^T S q $$

isn't linear we cannot have it in the derivative of the function

$$q \mapsto q^T S q $$

as this would make the map I stated as the derivative at the point $p $, namely $q \mapsto p^T S q + q^T S q$, nonlinear.

Insted we simply use that $\lim _{h \to 0 } h^T S h = 0$ (see here) and thus we have

$$\lim_{h \to 0 } \frac{|(p+h)^T S (p+h) - p^TS p - 2 p^T S h | }{|h|}= \lim_{h \to 0 } \frac{|h^T S h|}{|h|}=0$$

Mikosch
  • 191

2 Answers2

3

Why do you have $p$ and $q$ in your formula ? The gradient should be something like $p\rightarrow g + (S+S^T) p$, then to set it to zero you get $p = -(S+S^T)^{-1} g=-\frac{1}{2} S^{-1} g$ assuming that it is full rank (otherwise you can use someting like the Moore Penrose inverse).

P. Quinton
  • 6,031
  • maybe to clarify: The derivative of a map $p \mapsto $ something is still a map $p \mapsto $ something else. – ViktorStein Feb 08 '20 at 13:04
  • @P.Quinton I get that the derivative at a given point $p $ depends on $p $ - see the derivation I added above. You seem to arrive at the correct answer, but could you expand a little? Why do you calculate the gradient and not the derivative as I tried? And how is the gradient calculated? – Mikosch Feb 10 '20 at 16:55
  • In the $\mathbb R^n$ case, derivative doens't exist, instead you can compute derivative in one direction. If you do this is the directions $e_i$ the standard orthonormal basis vectors, then you get the gradient. You can check this out : https://math.stackexchange.com/questions/222894/how-to-take-the-gradient-of-the-quadratic-form – P. Quinton Feb 10 '20 at 18:23
2

First, let's check your answer. Suppose $n=1$, so that $p, S$ and $g$ are also scalars. Your original map can be written as $$\phi(p) = pg + p^2 S.$$ Thus $$\phi'(p) = g + 2p S,$$ which does not match your expression. In fact, I don't understand why you've defined it in terms of a different variable while keeping the old one in the expression.

Let now $\Phi(p) = p^T g + p^T S p$ be your actual expression to be differentiated. Another indication that you've done the differentiation incorrectly is the dimensionality, which is always good to check. Your expression for the derivative is a scalar, whereas $\Phi(p)$ is a scalar and $p = (p_1, ..., p_n)$, so your answer should be an $n$-dimensional gradient with entries $\partial \Phi(p) / \partial p_i$.

The answer is $$\nabla \Phi(p) = g^T + 2p^T S,$$ which I've written as a row vector by convention. Check out Propositions 7 and 8 in this document. You can solve the equation now.

snar
  • 7,388
  • 22
  • 25