Usage of the del operator $ \nabla $ as a vector.

Question

The scalar field $ \ \nabla \cdot \mathbf{F} \ $ is known as the divergence of the vector field $\mathbf{F}$.

Why can we use the del operator $ \nabla $ as a vector here? What do the components $ \partial / \partial x \ $, $ \partial / \partial y \ $ and $ \ \partial / \partial z \ $ of $ \nabla $ mean in this context?

They are clearly not variables or constants, so I'm not sure what they represent. They seem to be placeholders, but of what?

I'm okay with the usage of $ \nabla $ as "an operator", but I'm not seeing what it means to use it as a vector.

score 1 · Answer 1 · edited Mar 23 '18 at 18:52

Saying that $\nabla$ is the "vector" $\left < \frac{\partial}{\partial x}, \frac{\partial}{\partial y}, \frac{\partial}{\partial z} \right >$ is simply notational shorthand. It's so that we can write out the equations for divergence and curl as $\nabla \cdot \textbf{F}$ and $\nabla \times \textbf{F}$, without having to write out the actual definition every time.

Specifically, the curl is a somewhat complicated formula, namely: if $$\textbf{F}(x,y,z) = \langle u(x,y,z), v(x,y,z), w(x,y,z) \rangle, $$ then

$$\begin{align*} \nabla \times \textbf{F} & = \left\langle w_y - v_z, \; u_z - w_x, \; v_z - u_y\right\rangle \\ & = \left\langle \frac{\partial w}{\partial y} - \frac{\partial v}{\partial z}, \; \frac{\partial u}{\partial z} - \frac{\partial w}{\partial x}, \; \frac{\partial v}{\partial z} - \frac{\partial u}{\partial y} \right\rangle, \end{align*}$$ which is the definition of the curl of $\textbf{F}$, and which looks a lot like a cross product.

score 1 · Answer 2 · answered Mar 23 '18 at 18:20

Let $f:\mathbb{R}^n\to\mathbb{R}$ be a differentiable function and let $\gamma:(-\varepsilon,\varepsilon)\to\mathbb{R}^n$ be a smooth curve in $\mathbb{R}^n$ such that $\gamma(0)=p$ and $\gamma'(0)=v$ where $\gamma(t):=(x_1(t),...,x_n(t))$ then the derivative of $f$ at $p$ in the direction of $v$ can be computed as follows $$\frac{d(f\circ\gamma)(t)}{\,dt}\Big|_{t=0}=\frac{df(x_1(t),...,x_n(t))}{\,dt}\Big|_{t=0}=\sum_{k=1}^n\frac{\partial f}{\partial x_k}\frac{dx_k}{dt}\Big|_{t=0}=\sum_{k=1}^n\frac{\partial f}{\partial x_k}x_k'(0)$$ this can be rewritten as $$\sum_{k=1}^n\frac{\partial f}{\partial x_k}x_k'(0)=\Big(\sum_{k=1}^nx_k'(0)\frac{\partial }{\partial x_k}\Big)f$$ So you can think of {$\partial/\partial x_k$} as a basis for the linear operator $$A(t):=\sum_{k=1}^nx_k'(t)\frac{\partial }{\partial x_k}$$ Note $A(0)=v$. In this respect you can regard $\nabla:=(\partial/\partial x_1,...,\partial/\partial x_n)$ as a vector whose entries are the basis element $\{\partial/\partial x_k\}$ and this basis acts as an operator on a differentiable function by what is known as differentiation.

score 0 · Answer 3 · answered Feb 09 '23 at 18:07

$ \newcommand\PD[2]{\frac{\partial#1}{\partial#2}} \newcommand\tPD[2]{\partial#1/\partial#2} \newcommand\dd{\mathrm d} \newcommand\R{\mathbb R} \newcommand\diff\underline \newcommand\adj\overline \DeclareMathOperator\Tr{Tr} \newcommand\Hom{\mathrm{Hom}} $The following is a slight adaptation of What Is $\nabla$ Anyway? from my answer here.

For simplicity I will only consider real vector spaces and the standard inner product, but these constraints are not necessary.

Coordinate Expression

We could fiddle around with coordinates. If $\{x^i\}_{i=1}^m$ are coordinates of $\R^m$, then there is a function $p(x^1, x^2, \dotsc, x^m)$ taking coordinates to points; This gives us a basis $\{e_i\}_{i=1}^m$ of $\R^m$ via $e_i = \tPD p{x^i}$, and then its reciprocal $\{e^i\}_{i=1}^m$ is the unique basis such that $e^i\cdot e_j = \delta^i_j$. Now let $V$ be a vector space. Then for any $L_x : \R^m \to V$ which is linear for each $x \in \R^m$, i.e. $L_x(v)$ is linear in $v$, we define $$ L_x(\nabla) = \sum_{i=1}^m \PD{L_{p(x^1,\dotsc,x^m)}(e^i)}{x^i}. $$ This follows from informally applying the linearity of $L$ to $$ L_x(\nabla) = L_{p(x^1,\dotsc,x^m)}\left(\sum_{i=1}^me^i\PD{}{x^i}\right). $$ We would then go through the tedious task of confirming that this definition is independent of the coordinates chosen.

But I don't like coordinates.

Without Coordinates

First, for any function $f : \R^m \to V$ into some normed vector space $V$, the total differential $\diff f(x; {-}) : \R^m \to V$ of $f$ at $x \in \R^m$ is the linear function which best approximates $f$ at $x$. This captures all notions of the variation of $f$, and for any finite-dimensional $V$ is uniquely defined. Consider again linear functions $L_x : \R^m \to V$; if $\Hom(\R^m, V)$ is the set of linear functions $\R^m \to V$, then we can think of $L$ as a function $\R^m \to \Hom(\R^m, V)$ and so $\diff L(x; v) \in \Hom(\R^m, V)$ for any $v \in \R^m$. If $V$ is finite dimensional, then so is $\Hom(\R^m, V)$ and the differential is uniquely defined; we will be assuming this is the case. We then define the derivative of $L$ as $$ L_x(\nabla) = \Tr_{v\cdot w}\diff L(x; v)(w) \tag{Deriv} $$ where $\Tr_{v\cdot w}$ is indicating a metric contraction of the tensor $\diff L(x; {-})({-}) : \R^m\times\R^m \to V$. Note that $L_x(\nabla) \in V$, so in this sense $L_x$ takes $\nabla$ to $L_x(\nabla)$ just like it takes $v \in \R^m$ to $L_x(v)$. It is easy to verify that $$ (aL_x + bM_x)(\nabla) = aL_x(\nabla) + bM_x(\nabla),\quad a, b \in \R. $$

The way this works then is by defining an appropriate $L$:

If $f : \R^m \to \R$, then the gradient of $f$ is the derivative of $L_x(v) = vf(x)$.
If $F : \R^m \to \R^m$, then the divergence of $f$ is the derivative of $L_x(v) = v\cdot F(x)$.
If $F : \R^3 \to \R^3$, then the curl of $F$ is the derivative of $L_x(v) = v\times F(x)$.
We can also have some things which are not expressible in standard notation. If $L_x(v) = F(x)\cdot(v\times G(x))$, then $L_x(\nabla)$ is not $F(x)\cdot(\nabla\times G(x))$; $L_x(\nabla)$ is differentiating both $F(x)$ and $G(x)$.
It is easy to show that an expression like $L_x(\nabla, \nabla)$ for $L_x$ bilinear is independent of the order in which each $\nabla$ is applied (so long as partial derivatives commute, or equivalently $\diff{\diff L}(x; {-}, {-})$ is symmetric). This lets us define e.g. the Laplacian of $f(x)$ via $L_x(v, w) = (v\cdot w)f(x)$; notice that $f$ can be valued in any vector space.

Motivation

The definition (Deriv) should be unsatisfying; we basically just wrote the coordinate definition in coordinate-free language. How can we justify it? We make one assumption for $F : \R^m \to \R^m$: $$ \nabla\cdot F(x) = \Tr_v\diff F(x; v), $$ i.e. that $L_x(\nabla)$ for $L_x(v) = v\cdot F(x)$ is the trace of the differential of $F$. Now, the inner product induces an isomorphism $\flat : \R^m \to (\R^m)^*$ via $v^\flat(w) = v\cdot w$, where $(\R^m)^* = \Hom(\R^m, \R)$ is the dual space of $\R^m$. We denote $\sharp = \flat^{-1}$. If $L_x : \R^m \to \R$ then we argue $$ L_x(\nabla) = L_x^\sharp\cdot\nabla = \Tr_u\diff L(x; u)^\sharp = \Tr_{u\cdot v}\diff L(x; u)(v). $$ Thus when $L_x : \R^m \to V$ and $\alpha \in V^*$ we argue $$ \alpha(L_x(\nabla)) = (\alpha\circ L_x)(\nabla) = \Tr_{u\cdot v}\diff{\alpha\circ L_x}(x; u)(v) = \Tr_{u\cdot v}\alpha\bigl(\diff L(x; u)(v)\bigr). $$ Contracting the tensor $(w, \alpha) \mapsto w\alpha(L_x(\nabla))$ for $w \in V$ finally gives $$\begin{aligned} L_x(\nabla) &= \Tr_{\alpha(w)}w\Tr_{u\cdot v}\alpha(\diff L(x; u)(v)) \\ &= \Tr_{u\cdot v}\Tr_{\alpha(w)}w\alpha(\diff L(x; u)(v)) \\ &= \Tr_{u\cdot v}\diff L(x; u)(v) \end{aligned}$$ as desired.

The Chain Rule

Let $F : \R^m \to \R^n$. The chain rule takes the following form: $$ L_{F(x)}(\nabla) = L_{F(x)}\Bigl(\adj F(x; \nabla_F)\Bigr). $$ $\adj F(x; {-})$ is the adjoint of $\diff F(x; {-})$ under the inner products on $\R^m$ and $\R^n$. To be clear, what we mean by the RHS is the derivative of $y \mapsto L_y(\adj F(x; {-}))$ evaluated at $y = F(x)$. This chain rule can be expressed by the mnemonic $$ \nabla_x = \adj F(x; \nabla_F). $$

The Subexpression Rule

Consider $\R^{m+n} \cong \R^m\oplus\R^n$ and $L_{x\oplus y} : \R^m\oplus\R^n \to V$. Then $$ L_{x\oplus x}(\nabla) = L_{\dot x\oplus x}(\dot\nabla) + L_{x\oplus\dot x}(\dot \nabla), $$ where by e.g. $L_{\dot x\oplus x}(\dot\nabla)$ we mean the derivative of $x \mapsto L_{x\oplus y}$ evaluated at $y = x$; in other words, the undotted $x$ should be held constant when differentiating. We can summarize this "subexpression rule" as follows: the derivative of an expression is the sum of the derivatives of its subexpressions. We will demonstrate this rule in the next section.

Vectorial Manipulation

It is a simple fact that if $L_x(v) = M_x(v)$ then $L_x(\nabla) = M_x(\nabla)$; moreover, if $L_x$ and $M_x$ are multilinear then $L_x(\nabla,\dotsc,\nabla) = M_x(\nabla,\dotsc,\nabla)$. This means that so long as an expression involving $\nabla$ can be uniquely linearized, then the use of $\nabla$ is valid and it can be manipulated as a vector, so long as every instance of $x$ is differentiated.

(In this sense, the Laplacian $\nabla^2$ is not well-defined over fields of characteristic 2; in characteristic not 2, there is a (unique) symmetric bilinear form $v\cdot w$ such that $v\cdot v = |v|^2$, but in characteristic 2 this is not true.)

Some examples would be prudent. Consider now a vector identity like (in $\R^3$) $$ a\times(b\times c) = (a\cdot c)b - (a\cdot b)c. $$ If $L(a, b, c)$ is the LHS and $M(a, b, c)$ the RHS, then it is automatically true that e.g. $L(\nabla, F(x), c) = M(\nabla, F(x), c)$ since $L = M$. (To be clear, here we mean e.g. the derivative of $x \mapsto L({-}, F(x), c)$.) Written in standard notation, this looks like $$ \nabla\times(F(x)\times c) = (c\cdot\nabla)F(x) - (\nabla\cdot F(x))c. $$ This shows how many people go wrong with manipulating $\nabla$ expressions; because of the usual convention of "differentiate directly to the right", we had to flip the first term around. But this convention becomes unusable in a situation like $L(\nabla, F(x), G(x)) = M(\nabla, F(x), G(x)$; this is still true since $L = M$. But the following is false in conventional notation: $$ \nabla\times(F(x)\times G(x)) \not= (\nabla\cdot G(x))F(x) - (\nabla\cdot F(x))G(x) $$ since we need to differentiate both $F$ and $G$ in each expression, not just one or the other as the conventional notation suggests. What we might write to make this clear is $$ \nabla\times(F(x)\times G(x)) = (\dot\nabla\cdot G(\dot x))F(\dot x) - (\dot\nabla\cdot F(\dot x))G(\dot x). $$ To write this in conventional notation, we have to use the subexpression rule to differentiate one function at a time: $$\begin{aligned} &\nabla\times(F(x)\times G(x)) \\ &\quad= \bigl(\nabla\cdot G(x)\bigr)F(x) + \bigl(G(x)\cdot\nabla\bigr)F(x) - \bigl(\nabla\cdot F(x)\bigr)G(x) - \bigl(F(x)\cdot\nabla\bigr)G(x). \end{aligned}$$

Now consider the identity $$ |v|^2|w|^2 = (v\cdot w)^2 + |v\times w|^2. $$ Can we say that $$ \nabla^2|F(x)|^2 = (\nabla\cdot F(x))^2 + |\nabla\times F(x)|^2? $$ No: we can linearize the above as $$ (u\cdot v)|w|^2 = (u\cdot w)(v\cdot w) + (u\times w)\cdot(v\times w) $$ and take its second derivative, but each $\nabla$ must differentiate each copy of $F(x)$. That is to say that $$ \nabla^2|F(x)|^2 = (\dot\nabla\cdot F(\hat{\dot x}))(\hat\nabla\cdot F(\hat{\dot x})) + (\dot\nabla\times F(\hat{\dot x}))\cdot(\hat\nabla\times F(\hat{\dot x})) $$ Both of these terms then need to be expanded with the subexpression rule into a total of 8 terms, only two of which will be $(\nabla\cdot F(x))^2$ and $|\nabla\times F(x)|^2$.

Integral Representation

Again using the divergence is a starting point, it is well-known that $$ \nabla\cdot F(x) = \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\hat n(y)\cdot F(y)\,\dd S $$ where $R_x$ is a region with non-zero volume containing $x$, $\partial R_x$ is its boundary, $\hat n$ is the outward-pointing unit normal to $\partial R_x$, $\dd S$ is the surface area measure on $\partial R_x$, and $y$ is the vector variable of integration. The limit is taken such that the regions $R_x$ decrease in diameter around $x$. Following the Motivation section we get $$\begin{aligned} \alpha(L_x(\nabla)) &= (\alpha\circ L_x)(\nabla) = \nabla\cdot(\alpha\circ L_x)^\sharp \\ &= \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\hat n(y)\cdot(\alpha\circ L_y)^\sharp\,\dd S \\ &= \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}\alpha(L_y(\hat n(y)))\,\dd S \\ &= \alpha\left( \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S \right) \end{aligned}$$ for $L_x : \R^m \to V$ and any $\alpha \in V^*$, hence $$ L_x(\nabla) = \lim_{R_x\to 0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S $$ where $y$ is the variable of integration. This shows that $L_x(\nabla)$ can be interpreted as an average over an infinitesimal neighborhood of $x$.

Combining this with the algebraic expression for the derivative gives a geometric interpretation of the metric contraction of a tensor. For any $M(v, w)$ multilinear that has an "antiderivative" $L_x(v)$, i.e. $\diff L(x; v)(w) = M(v, w)$ for some $x$, we may write $$ \Tr_{v\cdot w} M(v, w) = \lim_{R_x\to0}\frac1{|R_x|}\oint_{\partial R_x}L_y(\hat n(y))\,\dd S. $$ Such an antiderivative is easy to construct in flat space: define $L_x(v) = M(x, v)$. Following this train of thought to its conclusion will give $$ \Tr_{v\cdot w} M(v, w) = \frac1{|V_m|}\oint_{\partial V_m}M(y, y)\,\dd S = \frac m{|\partial V_m|}\oint_{\partial V_m}M(y, y)\,\dd S $$ where $V_m \subset \R^m$ is the unit ball centered at the origin, $|V_m|$ its volume, and $|\partial V_m|$ its surface area. This last expression is exactly $m$ times the average of $y \mapsto M(y, y)$ over the unit sphere.