2

I generalize from this question that $\nabla_x(x^TA) = \nabla_x(A^Tx)=A^T$.

However, I'm having trouble with $\nabla_{x^T}(x^TA)$. What does it mean to take the gradient of a transpose of a vector?

  • 1
    The question you linked is about a derivative of that function, not the gradient. You can take the gradient of a scalar function, not of a vector $x^TA$. (I assume you are not talking of tensora here). – Mathemagical Jan 16 '18 at 01:05

1 Answers1

1

There are some issues with the formula you wrote.

  1. First, $x^{T}A \neq A^{T}x$.
  2. Second, you can only take the gradient of a scalar function. This is normally defined as the column vector $\nabla f = \frac{\partial f}{\partial x^{T}}$. In order to take "gradients" of vector fields, you'd need to introduce higher order tensors and covariant derivatives, but that's another story.
  3. Maybe by $\nabla_{x}$ you meant $\frac{\partial}{\partial x}$. In that case, neither $\frac{\partial x^{T}A}{\partial x} = \frac{\partial Ax^{T}}{\partial x}$ nor $\frac{\partial x^{T}A}{\partial x} = A^{T}$ hold because of (1). Nevertheless $\frac{\partial A^{T}x}{\partial x} = A^{T}$ holds for obvious reasons.

Well, I don't want to be all negativity. Here are a couple of properties of the derivatives w.r.t. a vector.


Say you have two column vectors $x,y\in\mathbb{R}^{n}$ and a scalar function $f$. Then the derivative $\frac{\partial f}{\partial x}$ is a row vector, and the derivative $\frac{\partial f}{\partial x^{T}}$ is a column vector.

For the scalar $x^{T}y = y^{T}x$ you have $$\frac{\partial x^{T}}{\partial x}y = \frac{\partial x^{T}y}{\partial x} = \frac{\partial y^{T}x}{\partial x} = y^{T}\frac{\partial x}{\partial x} = y^{T}$$ $$y^{T}\frac{\partial x}{\partial x^{T}} = \frac{\partial y^{T}x}{\partial x^{T}} = \frac{\partial x^{T}y}{\partial x^{T}} = \frac{\partial x^{T}}{\partial x^{T}}y = y$$

But for the derivative of a vector w.r.t. another vector there are no nice formulas except for the obvious ones. $$\frac{\partial Ax}{\partial x} = A$$ $$\frac{\partial x^{T}A}{\partial x^{T}} = A$$

Jackozee Hakkiuz
  • 5,583
  • 1
  • 14
  • 35
  • Interesting. but why is the gradient a row vector if it is the gradient of a function of a column vector variable? Is the opposite also true? Is there maybe a name or keyword for this, or some reference? – Hjan Feb 09 '23 at 08:41
  • 1
    @Hjan it is a possible convention out of many. Unfortunately there is not a unified way to treat this, so the conventions vary from book to book. You can find out more information about the most common conventions in the wiki about matrix calculus. In my answer to this question I stuck to the convention that was already being used in the question that OP linked. Hope to be of help. – Jackozee Hakkiuz Feb 09 '23 at 08:47