0

How do you find the gradient of $f(x)=(a^T x)(b^T x)$ where $a$, $b$, and $x$ are $n$-dimensional vectors?

So, far I tried by taking a derivative with chain rule: $$ D(f(x)) = D[(a^Tx)(b^Tx)] = (a^Tx)D(b^Tx) + (b^Tx)D((a^Tx)^T)$$ which leads me to: $$ (a^Tx)b^T + (b^Tx)(a^Tx)^T$$ but I'm not sure how to proceed.

3 Answers3

2

Some facts and notations before we start deriving the gradient:

  • Trace and Frobenius product relation $$\left\langle A, B C\right\rangle={\rm tr}(A^TBC) := A : B C$$
  • Cyclic properties of Trace/Frobenius product \begin{align} A : B C &= BC : A \\ &= A C^T : B \\ &= {\text{etc.}} \cr \end{align}

Towards this end, we rewrite your function \begin{align} f(x) &= a^T x b^T x\\ &= (a^Tx)^T b^Tx \\ &= a^Tx : b^T x \end{align}

Now, we can obtain the differential first, and then the gradient of $\frac{\partial f(x)}{\partial x}$. \begin{align} df(x) &= \left( a^T dx: b^T x \right) + \left( a^T x: b^T dx \right) \\ &= \left( b^T x : a^T dx \right) + \left( a^T x: b^T dx \right) \\ &= \left( ab^T x : dx \right) + \left( ba^T x: dx \right) \\ \end{align}

Thus, the gradient is \begin{align} \frac{\partial f(x)}{\partial x} = a b^T x + ba^T x. \end{align}

user550103
  • 2,688
2

Let $f : \mathbb R^n \to \mathbb R$ be defined by

$$f (\mathrm x) := \left( \mathrm a^\top \mathrm x \right) \left( \mathrm b^\top \mathrm x \right) = \left( \mathrm x^\top \mathrm a \right) \left( \mathrm b^\top \mathrm x \right) = \mathrm x^\top \mathrm a \mathrm b^\top \mathrm x$$

whose gradient is

$$\nabla f (\mathrm x) = \color{blue}{\left(\mathrm a \mathrm b^\top + \mathrm b \mathrm a^\top \right) \mathrm x}$$


0

Here's another way to solve the problem:

$$f(x)=(a^Tx)(b^Tx)$$ $$f(x)=\left(\begin{bmatrix}a_1&a_2&...&a_n\end{bmatrix}\begin{bmatrix}x_1\\x_2\\...\\x_n\end{bmatrix}\right)\left(\begin{bmatrix}b_1&b_2&...&b_n\end{bmatrix}\begin{bmatrix}x_1\\x_2\\...\\x_n\end{bmatrix}\right)$$ $$f(x)=(a_1x_1+a_2x_2+...+a_nx_n)(b_1x_1+b_2x_2+...+b_nx_n)$$ Now we need to take the partial derivative with respect to $x_i$: $$\frac{\partial{f}}{\partial{x_i}}=(a_1x_1+a_2x_2+...+a_nx_n)\frac{\partial}{\partial{x_i}}(b_1x_1+b_2x_2+...+b_nx_n) + \frac{\partial}{\partial{x_i}}(a_1x_1+a_2x_2+...+a_nx_n)(b_1x_1+b_2x_2+...+b_nx_n)$$ $$\frac{\partial{f}}{\partial{x_i}}=(a_1x_1+a_2x_2+...+a_nx_n)b_i + a_i(b_1x_1+b_2x_2+...+b_nx_n)$$ When we create the gradient by evaluating $\frac{\partial{f}}{\partial{x_i}}$ for every value of $i$ we get: $$\nabla{f}=\begin{bmatrix}a^Tx*b_1+b^Tx*a_1\\a^Tx*b_2+b^Tx*a_2\\...\\a^Tx*b_n+b^Tx*a_n\end{bmatrix}$$ $$\nabla{f}=(a^Tx)*b + (b^Tx)*a$$

I hope this helps!

63677
  • 282