10

I'm trying to wrap my head around how to apply the product rule for matrix-valued or vector-valued matrix functions.

Specifically, I'm trying to work through how to apply the product rule to $$x^TAx = f(x)g(x)$$ where $f(x) = x^T$, $g(x)=Ax$, $x\in\mathbb{R}^N$, and $A\in \mathbb{R}^{NxN}$

I know that $\nabla_x x^TAx = (A + A^T)x$ or $x^T(A + A^T)$ depending on the layout, however I'm just trying to use this as an example to see if I can get the same result with the product rule.

This question explains it for scalar-valued functions as $$f(x)\nabla_x g(x)+g(x)\nabla_x f(x).$$

However things don't have the correct dimensions when I plug in the values in the above, namely. As Travis wrote in the comment below, we should have:

$$ \nabla_x(x^TAx) = (\nabla_x x^T)Ax + x^T\nabla_x(Ax) $$

however that still leaves you with at least an $x$ in the first expression and an $x^T$ in the second. I don't see how that can conform and how it leaves you with $(A + A^T)x$ or $x^T(A + A^T)$

This question is essentially asking the same thing, but the answer doesn't really involve the product rule above. I figure there must be some general formula to apply, as with scalar-valued functions.

Am I writing the product rule correctly in this case? Is there somethign I'm missing or doing incorrectly?

EDIT:

Building off of Algabraic Pavel's answer... I think the problem is that you have to formulate the functions $f(x)$ and $f(x)$ so their in the same space.

That is, for $f,g:\mathbb{R}^N\rightarrow \mathbb{R}^M$, the product rule is:

$$\nabla_x (f(x)^Tg(x)) = f(x)^T\nabla_x g(x) + g(x)^T \nabla f(x)$$

So in the example above, if we let $f(x) = x$, $g(x)=Ax$, then the formula holds.

As another example, consider $$Axx^T$$ and let $f(x) = x^T A^T$ and $g(x) = x^T$. We have both $f,g:\mathbb{R}^{Nx1} \rightarrow \mathbb{R}^{1xN}$ and

$$\nabla_x (f(x)^Tg(x)) = \nabla_x (Axx^T) = Ax + xA^T$$

which holds, notice that if we made $f(x) = Ax$ and not $f(x) = (Ax)^T$, the rule falls apart.

I still don't know if this holds in all instances though. Any counter examples?

  • The rule is formally the same for as for scalar valued functions, so that $$\nabla_X (x^T A x) = (\nabla_X x^T) A x + x^T \nabla_X(A x) .$$ We can then apply the product rule to the second term again. NB if $A$ is symmetric we can simply the final expression using $\nabla_X (x^T) = (\nabla_X x)^T$. – Travis Willse May 04 '18 at 13:39
  • But doesn't that still leave you with an $x^T$ in one expression and an $x$ in another? I'm just not seeing how they conform... but I know I'm clearly missing something. We know that the answer is $(A + A^T)x$ – measure_theory May 04 '18 at 13:42
  • Your notation is rather misleading, especially using $X$ in place of $x$ as the direction of differentiation. I see now that you're asking about another quantity altogether. – Travis Willse May 04 '18 at 14:51
  • Should be fixed now. – measure_theory May 04 '18 at 14:53
  • There is a very general rule for the differential of a product $$d(A\star B)=dA\star B + A\star dB$$ where $\star$ is any kind of product (matrix, Hadamard, Frobenius, Kronecker, dyadic, etc} and the quantities $(A,B)$ can be scalars, vectors, matrices, or tensors. There is no general rule for the gradient of a product. – greg May 04 '18 at 17:58
  • In order to make sense of a direct application of the product rule as you’re trying to do, you first have to define what it means to apply $\nabla$ to a vector. What do you mean by it in your equations? – amd May 04 '18 at 20:46
  • In your edit, how do you define the gradient of a matrix? – Algebraic Pavel May 05 '18 at 10:28

1 Answers1

6

It all depends on the conventions you use. Examine the product rule derivative component by component and get that in this case it gives you $$ \tag{1} \nabla_x[f(x)^Tg(x)]=f(x)^T\nabla_xg(x)+g(x)^T\nabla_x f(x). $$ So with $f(x):=x$ and $g(x):=Ax$, we have $$ \nabla_x(x^TAx)=x^TA+x^TA^T=x^T(A+A^T). $$


If $f,g:\mathbb{R}^n\to\mathbb{R}^m$, then $$ \frac{\partial}{\partial x_j}f^Tg= \frac{\partial}{\partial x_j}\sum_{i=1}^mf_ig_i= \sum_{i=1}^m\left(f_i\frac{\partial g_i}{\partial x_j}+g_i\frac{\partial f_i}{\partial x_j}\right). $$ So defining $$ \nabla_x f=\left(\frac{\partial f_i}{\partial x_j}\right)_{ij} $$ gives (1).