I'm trying to prove that when $f(x) =x^TBx$, then $f'(x) = (B + B^T)x$. I haven't found this formula online but going through the calculations using index notation this is what I came up with. This would simplify to $2Bx$ when $B$ is symmetric. The accepted response to this discussion says that the solution is actually $f'(x) = x^T(B + B^T)$, going through the proof there, I see how he got there but I can't see where the mistake is in mine then.
The setup
- $x \in \mathbb{R^n}$, it is always a column vector
- $B \in \mathbb{R^{n \times n}}$, $B$ may not be symmetric
My approach
Let $g(x)=x^TB$ and $h(x)=x$, then I can write $f(x)=g(x)h(x)$. Then
- $f(x) \in \mathbb{R}$
- $g(x) \in \mathbb{R^{1 \times n}}$
- $h(x) \in \mathbb{R^n}$
- $f'(x) \in \mathbb{R^n}$
- $g'(x) \in \mathbb{R^{n \times n}}$
- $h'(x) \in \mathbb{R^{n \times n}}$
I've gone through myself why $g'(x) = B$ and $h'(x) = I_n$, so I won't go through those here.
Then, using the product rule I get:
$$f'(x) = g'(x)h(x) + g(x)h'(x)$$
The problem is that the dimensions don't add up. I get $g'(x)h(x) = Bx \in \mathbb{R^{n}}$, which is good. However, I also have $g(x)h'(x) = x^TBI_n = x^TB \in \mathbb{R^{1 \times n}}$ and as far as I know I can't add up two vectors of different sizes.
I know that the solution is going to be the transpose of second term, I just can't seem to find where that transpose would come from.
Why do I need to take the transpose of the second term?
[Edit]: Please don't reply with a different proof. What I'm looking for is to understand where I made the mistake in my calculation because obviously I made a step which was incorrect and without understanding where that is I'm likely to make that mistake again.