1

Suppose $A \in \mathbb{R}^{n\times n}$ is symmetric and $x,b,c \in \mathbb{R}^n$. I would like to compute $\nabla f$ where $$f = \frac{1}{2}x^T A x - x^Tb + c.$$ Now my issue is in how the derivative is calculated. I have seen $$\frac{\partial}{\partial x} x^TAx = x^T(A+A^T)$$ and also $$\frac{\partial}{\partial x} x^TAx = (A+A^T)x$$ which one is correct? In my readings I more frequently see this derivative as $x^T(A+A^T)$ however I also see that $\nabla f = Ax - b$ (for example, on Wikipedia), but the first definition does not imply the expression for $\nabla f$ (shouldn't we have $\nabla f = x^TA - b$)?

Thanks.

CBBAM
  • 5,883
  • 2
  • 6
  • 18
  • Note that these two expressions are the transpose of each other, and since A is symmetric the transpose of the left hand side is itself – Aphyd Jan 07 '22 at 23:10
  • @Aphyd Which left hand side? – CBBAM Jan 07 '22 at 23:13
  • x^T A x. But now I've forgotten if transpose and derivatives commute – Aphyd Jan 07 '22 at 23:17
  • @Aphyd In terms of dimensions, since $x^TA$ is $1 \times n$ how can it be the same as $Ax$ (which has dimension $n \times 1$) unless they are row/column representations of one another. – CBBAM Jan 07 '22 at 23:22
  • 1
    Good point. To be honest in my classes my professors always just explain derivatives with respect to vectors as "just temporarily pretend the vector is a number and differentiate as usual" which has always been annoying and not very satisfactory. Hopefully someone who knows more than me about this will come along. – Aphyd Jan 07 '22 at 23:28
  • 1
    The last of your three displays is the derivative of $x^T A x$ with respect to $x^T$, not with respect to $x$. See e.g. this. – Jakob Streipel Jan 07 '22 at 23:43
  • (Assuming numerator layout, anyway. In column layout it's the opposite.) – Jakob Streipel Jan 07 '22 at 23:54
  • @prets Ah that makes sense, thank you! – CBBAM Jan 08 '22 at 00:00

1 Answers1

2

There is no correct or incorrect computations, there are mostly conventions. You may arrange your partial derivatives in a column or a row vector. The important thing is that you must stick to yourconvention all along the way: this choice will impose a certain structure on the chain rule for instance. The Wikipedia page wiki is the place to learn more on the two options.

Steph
  • 3,665