The gradient as a row versus column vector

Question

Kaplan's Advanced Calculus defines the gradient of a function $f : \mathbb{R^n} \to \mathbb{R}$ as the $1 \times n$ row vector whose entries respectively contain the $n$ partial derivatives of $f$. By this definition then, the gradient is just the Jacobian matrix of the transformation.

We also know that using the Riesz representation theorem, assuming $f$ is differentiable at the point $x$, we can define the gradient as the unique vector $\nabla f$ such that

$$ df(x)(h) = \langle h, \nabla f(x) \rangle, \quad h \in \mathbb{R}^n $$

Assuming we ignore the distinction between row vectors and column vectors, the former definition follows easily from the latter. But, row vectors and column vectors are not the same things. So, I have the following questions:

Is the distinction here between row/column vectors important?
If (1) is true, then how can we know from the second defintion that the vector in question is a row vector and not a column vector?

The gradient as a row vector seems pretty non-standard to me. I'd say vectors are column vectors by definition (or usual convention), so $df(x)$ is a row vector (as it is a functional) while $\nabla f(x)$ is a column vector (the scalar product is a product of two vectors. And yes, the distinction is important. — t.b., Jul 28 '11 at 21:46
@Qiaochu Near the top of page 94 he writes "The Jacobian matrix of $f$ is the row vector $(\partial_x f, \partial_y f, \partial_z f)$. We call this vector the gradient vector of $f$ and write $\nabla f$". — ItsNotObvious, Jul 28 '11 at 22:12
@3Sphere: yes, and...? I don't think that's the standard definition of the gradient in general. — Qiaochu Yuan, Jul 28 '11 at 22:17
@Qiaochu No "and" really other than just to note that, according to what both you and Theo have indicated, Kaplan's definition there is incorrect. Which is unfortunate since I absorbed that definition some time ago and it has been living in my head ever since... — ItsNotObvious, Jul 28 '11 at 22:24
@3Sphere: well, "incorrect" is slightly overstating it. There are two reasonable things one might naively call the gradient (a certain family of tangent vectors, or the family of linear functionals it defines) on $\mathbb{R}^n$ and these things happen to generalize differently. The definition is probably fine for working on $\mathbb{R}^n$ although I personally get confused when people don't distinguish between vectors and covectors (e.g. in intro special relativity courses) since they transform differently under a change of variables. — Qiaochu Yuan, Jul 28 '11 at 22:32
@ItsNotObvious I'm a little confused about your equation $df(x)(h)=\left\langle{h},\nabla{f}(x)\right\rangle$. I intuit gradients as row vectors (cotangent covectors) that are multiplied on-the-right by differential tangent (column) vectors, which would be your $h$'s. I thus expected to see $\left\langle\nabla{f}(x),h\right\rangle$. Maybe my intuition conflates differential forms and gradients (see answer below), and I'm just accustomed always to think about differential forms. — Reb.Cabin, Mar 03 '18 at 18:53

Qiaochu Yuan · Accepted Answer · 2011-07-28T21:54:23.487

Yes, the distinction between row vectors and column vectors is important. On an arbitrary smooth manifold $M$, the derivative of a function $f : M \to \mathbb{R}$ at a point $p$ is a linear transformation $df_p : T_p(M) \to \mathbb{R}$; in other words, it's a cotangent vector. In general the tangent space $T_p(M)$ does not come equipped with an inner product (this is an extra structure: see Riemannian manifold), so in general we cannot identify tangent vectors and cotangent vectors.

So on a general manifold one must distinguish between vector fields (families of tangent vectors) and differential $1$-forms (families of cotangent vectors). While $df$ is a differential form and exists for all $M$, $\nabla f$ can't be sensibly defined unless $M$ has a Riemannian metric, and then it's a vector field (and the identification between differential forms and vector fields now depends on the metric).

If one thinks of tangent vectors as column vectors, then $\nabla f$ ought to be a column vector, but the linear functional $\langle -, \nabla f \rangle$ ought to be a row vector. A major problem with working entirely in bases is that distinctions like these are frequently glossed over, and then when they become important students are very confused.

Some remarks about non-canonicity. The tangent space $T_p(V)$ to a vector space at any point can be canonically identified with $V$, so for vector spaces we don't run into quite the same problems. If $V$ is an inner product space, then in the same way it automatically inherits the structure of a Riemannian manifold by the above identification. Finally, when people write $V = \mathbb{R}^n$ they frequently intend $\mathbb{R}^n$ to have the standard inner product with respect to the standard basis, and this equips $V$ with the structure of a Riemannian manifold.

It is important to keep track of what things are and what extra structures they depend on, but distinctions like row vector versus column vector are essentially cosmetic until you have a whole family of objects, and then only because you want natural operations on your vectors to correspond to geometric operations and not nonsense (and you want the geometric operations to correspond to vector operations which are as simple as possible). If you are considering "row vector vs. column vector" you have already fixed a basis, so much of the intrinsic is already lost. — Aaron, Jul 28 '11 at 22:06
@Aaron: well, I am just using "column vector" as a euphemism for "tangent vector" and "row vector" as a euphemism for "cotangent vector." I prefer that nobody use these terrible terms, but as long as the OP is... — Qiaochu Yuan, Jul 28 '11 at 22:08
Looking at the Jacobian matrix makes it really look like row vectors dont it? i.e thinking of it as a linear map on the space on which it's function act. — , Feb 04 '18 at 19:27

score 9 · Answer 2 · edited Apr 15 '23 at 15:43

Here's a simple heuristic. TL;DR: If the domain of $f: R^n\to R$ is a space of column vectors, then $f'(x)$ needs to be a row vector for the linear approximation property to make sense.

To make the distinction between row and column vectors explicit, I'll write $R^{k\times 1}$ and $R^{1\times k}$ for the spaces of $k$-dimensional column and row vectors of real numbers, respectively.

If $f:R^{n\times 1}\to R^{m\times 1}$, then, for every $x\in R^{n\times 1}$, the derivative $f'(x)$ is characterized by its linear approximation property, $$ f(x + h) \approx f(x) + f'(x)h, $$ for small $h\in R^{n\times 1}$. Now think about the sizes of the matrices involved: $f(x)$ has size $m\times 1$, so $f'(x)h$ also needs to have size $m\times 1$ for the right hand side to make sense. But $h\in R^{n\times 1}$, so $f'(x)h$ has size $m\times 1$ exactly when $f'(x)$ has size $m\times n$.

In particular, if $f:R^{n\times 1}\to R^{1\times 1}$, then $f'(x)$ needs to be an $1\times n$ matrix, i.e., a row vector.

The gradient as a row versus column vector

2 Answers2

Linked

Related