Definition of the gradient for non-Cartesian coordinates

Question

The gradient of a function $f: \mathbb{R}^n \to \mathbb{R}$ is defined as the vector of the partial derivatives:

$$ \nabla f = \left(\frac{\partial f}{\partial x_1}, ..., \frac{\partial f}{\partial x_n}\right)$$

Recently, I have become somewhat confused over this definition since I realized that if, for example, $f$ is defined in spherical coordinates $(r, \theta, \phi)$ , the gradient is given as

$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{1}{r} \frac{\partial f}{\partial \theta}, \frac{1}{r \sin \theta} \frac{\partial f}{\partial \phi} \right)$$

rather than

$$ \nabla f = \left(\frac{\partial f}{\partial r}, \frac{\partial f}{\partial \theta}, \frac{\partial f}{\partial \phi} \right)$$

I have two questions regarding this:

Is a scalar-valued function in spherical coordinates still considered to be $f: \mathbb{R}^3 \to \mathbb{R}$, or is $\mathbb{R}^3$ reserved for Cartesian coordinates?
Does the "partial derivative" definition of the gradient in fact require Cartesian coordinates?

The answers to this question may be helpful: http://math.stackexchange.com/questions/42177/what-does-d-textbfa-f-mean — wildildildlife, Jun 25 '11 at 20:05
+1 A good point to ask. I recall having had a related mistaken idea stemming from the form of $\nabla$. This was in a theoretical mechanics course, and my excuse was that I had next to zero exposure to differential geometry. I somehow assumed that the formula $$\nabla=\left(\frac{\partial}{\partial r} ,\frac1r,\frac{\partial}{\partial\theta}, \frac1{r\sin\theta},\frac{\partial}{\partial\phi} \right)$$ can also be used to compute, e.g. the divergence of a vector field (written using the spherical basis). To expose the folly, compute the divergence of $x\vec{i}+y\vec{j}+z\vec{k}=r\vec{e}_r$. — Jyrki Lahtonen, Jun 26 '11 at 07:00

score 12 · Accepted Answer · answered Jun 26 '11 at 07:30

It turns out that there are two different but related notions of differentiation for a function $f:\mathbb R^n\to\mathbb R$: the total derivative $df$ and the gradient $\nabla f$.

The total derivative is a covector ("dual vector", "linear form") and does not depend on the choice of a metric ("measure of length").
The gradient is an ordinary vector and derived from the total derivative, but it depends on a metric. That why it looks a bit funny in different coordinate systems.

The definition of the total derivative answers the following question: given a vector $\vec v$, what is the slope of the function $f$ in the direction of $\vec v$? The answer is, of course

$$ df_{x}(\vec v) = \lim_{t\to0} \frac{f(x+t\vec v)-f(x)}{t}$$

I.e. you start at the point $x$ and walk a teensy bit in the direction of $\vec v$ and take note of the ratio $\Delta f/\Delta t$.

Note that the total derivative is a linear map $\mathbb R^n \to \mathbb R$, not a vector in $\mathbb R^n$. Given a vector, it tells you some number. In coordinates, this is usually written as

$$ df = \frac{\partial f}{\partial x}dx + \frac{\partial f}{\partial y}dy + \frac{\partial f}{\partial z}dz $$

where $dx,dy,dz$ are the total derivatives of the coordinate functions, for instance $dx(v_x,v_y,v_z) := v_x$. This formula looks the same in any coordinate system.

In contrast, the gradient answers the following question: what is the direction of the steepest ascend of the function? Which vector $\vec v$ of unit length maximizes the function $df(\vec v)$? As you can see, this definition crucially depends on the fact that you can measure the length of a vector. The gradient is then defined as

$$ \nabla f = df(\vec v_{max})\cdot\vec v_{max} $$

i.e. it gives both the direction and the magnitude of the steepest change.

This can also be expressed as

$$ \langle \nabla f, \vec v \rangle = df(\vec v) \quad\forall \vec v\in\mathbb R^n.$$

In other words, the scalar product $\langle,\rangle$ is used to convert a covector $df$ into a vector $\nabla f$. This also means that the formula for the gradient looks very different in coordinate systems other than cartesian. If the scalar product is changed (say, to $\langle\vec a,\vec b\rangle := a_xb_x + a_yb_y + 4a_zb_z$), then the direction of steepest ascend also changes. (Exercise: Why?)

Thanks, nice answer... in fact this even answers some questions I did not ask (but have been thinking about for some time) — koletenbert, Jun 28 '11 at 19:27

score 5 · Answer 2 · edited Apr 13 '17 at 12:20

For your first question, whether a function is "scalar-valued" or not doesn't depend on the coordinate system of the domain. Any function that evaluates to a value in the underlying field, in this case $\mathbb{R}$, is scalar-valued. Unless explicitly stated otherwise though, people usually assume Cartesian coordinates with the standard basis vectors when they are referencing $\mathbb{R}^n$.

Your second question takes a little more work to answer but the short answer is, yes, as it is typically defined, the gradient is specified in terms of Cartesian coordinates but there is a much better approach. I have recently been working on material that is directly related to this. See, for example, this question I recently posed. The point there was the following:

If $f:X\subset \mathbb{R}^n \rightarrow \mathbb{R}$ then the derivative of $f$ at $x_0$, $df_{x_0}$, is a linear function $df_{x_0}:\mathbb{R^n} \rightarrow \mathbb{R}$ By the Riesz representation theorem the exists a unique vector in $\mathbb{R}^n$, which we denote by $\nabla f(x_0)$, that satisfies

$$ df_{x_0}(v) = g(v, \nabla f(x_0)) $$ for every $v \in \mathbb{R}^n$ and where $g$ is an inner product on $\mathbb{R}^n$ Note that this definition is free of coordinates but does require the existence of an inner product (which may or may not be the standard one). See for example [AMANN, p 160] or [FRANKEL, p46] for a discussion of this perspective.

Now, it will turn out that if you do use standard Cartesian coordinate vectors then you can recover the "typical" definition of the gradient from this one. To see this though, and to see where the expression for the gradient in spherical coordinates that you provided in your question comes from, requires us to dig deeper.

Now, it can be shown that

$$ \nabla f(x_0) = (g^{1k} \partial_k f(x_0), \dots, g^{nk} \partial f(x_0)) $$

where $g^{ij}$ denotes the $i,j$ entry of the inverse of the matrix $G = [g_{ij}]$ I'll reference you again to my previous question for the details of this statement. So, this expression gives us a concrete way for actually calculating the gradient but in order to do so, we will need to figure out how to actually compute the matrix $G$

For a (tractable) example, let us consider polar coordinates. They are related to Cartesian coordinates by the well-known formulae $x = r \cos (\theta)$ and $y = r \sin (\theta)$. It can be shown that the matrix $G$ is determined by the relation $G = J^TJ$ where $J$ is the Jacobian of the transformation in question. See [KAY, p54] for a reference to this fact.

In the case of polar coordinates, the transformation is given by $$ T(r,\theta)= (r \cos (\theta), r \sin (\theta)) $$

The Jacobian of the transformation then is

$$ J = \bigl( \begin{array}{ccc} \cos (\theta) & -r \sin (\theta) \\ \sin (\theta) & r \cos(\theta) \end{array} \bigr) $$

After working through the details we find then that

$$ G = J^TJ = \bigl( \begin{array}{ccc} 1 & 0 \\ 0 & r^2 \end{array} \bigr) $$

Therefore,

$$ G^{-1} = \bigl( \begin{array}{ccc} 1 & 0 \\ 0 & \frac{1}{r^2} \end{array} \bigr) $$

From this matrix then we can read off the $g^{ij}$ components from which it follows that

$$ \nabla f(x_0) = (\partial_r f(x_0), \frac{1}{r^2} \partial_{\theta} f(x_0)) $$

But, we are still not done. This expression does not agree with what you will usually encounter which is

$$ \nabla f(x_0) = (\partial_r f(x_0), \frac{1}{r} \partial_{\theta} f(x_0)) $$

where there is a missing factor of $r$ in the second coordinate. So, what's going on here? First, we note that since we are working in $r - \theta$ coordinates, the gradient vector is relative to $r - \theta$ bases. Our component-wise notation is obscuring this fact. So what we actually have is

$$ \nabla f(x_0) = \partial_r f(x_0) e_r + \frac{1}{r^2} \partial_{\theta}f(x_0)e_{\theta} $$

where $e_r$ and $e_{\theta}$ denote the $r - \theta$ basis vectors. So, what are they? Well, you can always use geometry to figure this out but, since I'm really lousy at geometry I like to think of them as being defined analytically as tangent vectors. See [KOKS, p 298] for a discussion of this perspective. To determine them we just differentiate our transformation $T$, with respect to $r$ and $\theta$ respectively. Thus

$$ e_r = \partial_r T = (\cos (\theta), \sin(\theta)) $$

and

$$ e_{\theta} = \partial_{\theta}T = r(-\sin( \theta), \cos (\theta)) $$

Note though that while $e_r = \hat{e_r}$ is a unit vector $e_{\theta}$ is not. Using some algebra we see that $e_{\theta} = |r|\hat{e_{\theta}}$. Therefore, the gradient with respect to the unit basis vectors is given by

$$ \nabla f(x_0) = \partial_r f(x_0) \hat{e_r} + \frac{1}{r} \partial_{\theta}f(x_0)\hat{e_\theta} $$

and we thus have agreement with the common expression for the gradient in polar coordinates.

References:

[AMANN] Amann and Escher, Analysis II

[FRANKEL] Frankel, The Geometry of Physics

[KAY] Kay, Schaum's Outline of Tensor Calculus

[KOKS] Koks, Explorations in Mathematical Physics

Thanks! I have to admit this is still somewhat over my head, I am trying to get there — koletenbert, Jun 28 '11 at 19:29
@kolentenbert You're welcome. Let me know if there's any particular step that I could shed more light on. — ItsNotObvious, Jun 28 '11 at 21:10

score 0 · Answer 3 · answered Jun 25 '11 at 19:03

In spherical coordinates $f:(0,\infty)\times (0,2\pi]\times [0,\pi]\to \mathbb{R}$ because the radius has to be positive, and the others are the possible angles you can choose.

The second question is a little strange to me. I think I understand what you're asking. To me, that formula you wrote for the gradient in spherical coordinates actually is a "partial derivative" definition.

Here is what I think might be causing some confusion. When you write the triple $\nabla f=\left(\frac{\partial f}{\partial x}, \frac{\partial f}{\partial y}, \frac{\partial f}{\partial z}\right)$, that triple means $\frac{\partial f}{\partial x}\hat{i}+ \frac{\partial f}{\partial y}\hat{j} + \frac{\partial f}{\partial z}\hat{k}$ where $\hat{i}$ means the standard unit vector $(1,0,0)^T$, etc.

When you write that second expression, the triple doesn't actually mean that anymore. I.e. $\nabla f \neq \frac{\partial f}{\partial r}\hat{i}+\cdots $. It is actually in terms of the standard unit vectors in spherical coordinates. So $$\nabla f= \frac{\partial f}{\partial r}\hat{r} + \frac{1}{r}\frac{\partial f}{\partial \theta}\hat{\theta} + \frac{1}{r\sin\theta}\frac{\partial f}{\partial \phi}\hat{\phi}$$

I'm pretty sure if you write what all of these things mean using the change of coordinates formulas and some chain rules you should get the two are equivalent, but it will be quite tedious.

I still feel like I haven't actually answered your question, so maybe someone else will have a better explanation...

Definition of the gradient for non-Cartesian coordinates

3 Answers3

Linked