Is there a formula for $\nabla^n(fg)$?

Question

Applying the ordinary Leibniz rule multiple times leads to the general Leibniz rule for $n$th derivatives of $fg$, namely $(fg)^{(n)}=\sum_{k=0}^n\binom{n}{k}f^{(n-k)}g^{(k)}$. Is there an analogous formula for repeated gradients/divergences of a product of scalar functions? That is, $\nabla^n(fg)=$ ? Of course $\nabla^nf$ is understood to mean $\cdots\nabla(\nabla\cdot(\nabla f))$, with $\nabla$ appearing $n$ times, either as a gradient or divergence according to what makes sense.

While some identities involving $\nabla$ look very similar to the Leibniz rule, the problem is that $\nabla(X\cdot Y)\neq(\nabla X)Y+X\nabla Y$. (If equality held here, I think a form of the generalized Leibniz rule would follow easily.) Let's agree to keep expressions like this unevaluated, so that no curls appear, nor any other operations other than gradients and divergences. I.e., the final formula can (and probably must) contain expressions like $\nabla[(\nabla f)\cdot(\nabla g)]$.

That said, if there is an alternative notation (e.g., tensor indices) that makes the formula easier to write, that would also be fine.

Split up according to whether $n$ is even or odd because it is either some number of Laplacian applications or there is an extra $\nabla$ leftover. — AHusain, Aug 23 '22 at 14:53
Does this provide a starting point? https://math.stackexchange.com/questions/505215/show-that-nabla2fg-f-nabla2gg-nabla2f2-nabla-f-cdot-nabla-g — Toffomat, Aug 23 '22 at 15:06
@Toffomat That shows the first step, but the problem is how to generalize to $n$ steps. And the generalization is not as straightforward as the one-dimensional case due to the subtlety explained below my question. — WillG, Aug 23 '22 at 15:12

Nicholas Todoroff · Accepted Answer · 2022-08-24T03:41:05.760

$ \newcommand\R{\mathbb R} \newcommand\PD[2]{\frac{\partial#1}{\partial#2}} $

We can sort of do this. The key is keeping track of what is being differentiated, which allows us to take derivatives piece by piece. So what does that mean?

The Leibniz Formula

Consider the scalar case where $f, g : \R \to \R$. (Of course, the domain could be smaller.) We want to compute $D^n(fg)$ with $D$ the derivative operator. We can split $D$ into two operators $D = D_f + D_g$ where $D_f$ only differentiates expressions involving $f$ and $D_g$ only expressions involving $g$. In the scalar case, this is simply the product rule: $$\begin{aligned} D(X(f)Y(g)) &= (D_f + D_g)(X(f)Y(g)) \\ &= [D_fX(f)]Y(g) + X(f)[D_gY(g)] \\ &= [DX(f)]Y(g) + X(f)[DY(g)], \end{aligned}$$ where here $X(f), Y(g)$ stand for expressions only involving $f, g$ respectively. The key is then that $D_f$ and $D_g$ commute; hence it immediately follows that $$ D^n(fg) = (D_f + D_g)^n(fg) = \left[\sum_{i=0}^n{n\choose i}D_f^iD_g^{n-i}\right](fg) = \sum_{i=0}^n{n\choose i}[D^if][D^{n-i}g]. $$

Moving on to Gradients

More generally, this splitting $D = D_f + D_g$ is a consequence of the multivariable chain rule (and hence the product rule is actually a consequence of the chain rule). We can, in fact, do the exact same thing with the gradient $\nabla$, and I will give a proof at the bottom of this post. What changes though is that we must keep track of where the gradient is since it is vectorial. In other words, $\nabla_f$ and $\nabla_g$ do not commute. For example, with $f, g : \R^N \to \R$ we could compute $\nabla^3(fg)$ by $$\begin{aligned} \nabla^3(fg) &= \nabla^2\dot\nabla(\dot fg + f\dot g) \\ &= \nabla^2([\nabla f]g + f[\nabla g]) \\ &= \nabla\dot\nabla([\nabla\dot f]g + [\nabla f]\dot g + \dot f[\nabla g] + f[\nabla\dot g] \\ &= \nabla([\nabla^2 f]g + [\nabla g][\nabla f] + [\nabla f][\nabla g] + f[\nabla^2 g]) \\ &= \nabla([\nabla^2 f]g + 2[\nabla f]\cdot[\nabla g] + f[\nabla^2 g]) \\ &= \dot\nabla([\nabla^2\dot f]g + [\nabla^2 f]\dot g + 2[\nabla\dot f]\cdot[\nabla\dot g] + \dot f[\nabla^2 g] + f[\nabla^2\dot g]) \\ &= [\nabla^3 f]g + [\nabla^2 f][\nabla g] + 2\nabla[\nabla f]\cdot[\nabla g] + [\nabla f][\nabla^2 g] + f[\nabla^3 g]. \end{aligned}$$ Note that I take dot products as binding tighter than juxtaposition. Rather than a notation like $\nabla = \nabla_f + \nabla_g$, I've used a more compact overdot notation; the overdots specify what is being differentiated by $\dot\nabla$. This allows us to keep $\dot\nabla$ in the same place, which again is important since $\nabla$ is vectorial. In the 4th line, we see this in action; we've used the fact that $g$ is scalar valued to write $$ \dot\nabla[\nabla f]\dot g = \dot\nabla\dot g[\nabla f] = [\nabla g][\nabla f]. $$ But what is $[\nabla g][\nabla f]$? This is the geometric product of two vectors; your choice that $\nabla^n$ be alternating divergences and gradients corresponds to $\nabla$ being the vector derivative from geometric calculus. This is not hugely important right now; all we need to know is that $vw + wv = 2v\cdot w$ for vector $v$ and $w$, which is what you see happening in the 5th line.

In this example, we already see what's blocking us from a nice "separation of derivatives" like the Leibniz formula: the term $$ \nabla[\nabla f]\cdot[\nabla g] $$ i.e. the gradient of $[\nabla f]\cdot[\nabla g]$. This simply isn't expressible in terms of gradients of $f$ and $g$ (without resorting to coordinates). At least, I don't see any way it would be possible.

The Cross-Divergence and Bi-Laplacian

What we can do though is formulate a Leibniz-like formula using certain types of operators. Let $F, G$ be vector valued; define the cross-divergence as $$ F\Delta G := (\dot F\cdot\hat\nabla)(\dot\nabla\cdot\hat G) = \sum_{i=1}^n\sum_{j=1}^N\PD{F_i}{x_j}\PD{G_j}{x_i}. $$ The hat here is the same as the overdot, indicating that $\hat\nabla$ is differentiating $\hat G$. I've given the corresponding coordinate expression for clarity. We will also let $\Delta$ act on scalars $f, g$; analogously $$ f\Delta g := (\dot f\hat\nabla)\cdot(\dot\nabla\hat g) = (\nabla f)\cdot(\nabla g). $$ This shows that splitting $\nabla = \nabla_f + \nabla_g$ allows us to write $$ (\nabla_f\cdot\nabla_g)fg = f\Delta g. \tag{$*$} $$ We also define the bi-Laplacian $$ f\Delta^2 g := \bigl[(\nabla\dot f)\cdot\hat\nabla\bigr]\bigl[\dot\nabla\cdot(\nabla\hat g)\bigr] = \sum_{i=1}^N\sum_{j=1}^N\PD{^2f}{x_i\partial x_j}\PD{^2g}{x_i\partial x_j}. $$ This is exactly $(\nabla f)\Delta(\nabla g)$. We see that $$ [\nabla_f\cdot\nabla_g]^2fg = [(\nabla f)\cdot\nabla_g][\nabla_f\cdot(\nabla g)] = [(\nabla\dot f)\cdot\hat\nabla g][\dot\nabla_f\cdot(\nabla\dot g)] = f\Delta^2 g, $$ which will be crucial in our generalized Leibniz. The bi-Laplacian results in a product of an expression in $f$ and an expression in $g$; this allows us to iterate and define $\Delta^{2k}$. Define the operators $\delta f$ by $\dot\delta f = (\nabla f)\cdot\dot\nabla$. Then $$ f\Delta^2 g = (\dot\delta\hat f)(\hat\delta\dot g), $$$$ f\Delta^4 g := (\dot\delta\hat f)\Delta^2(\hat\delta\dot g) = (\dot\delta^2\hat f)(\hat\delta^2\dot g), $$$$ f\Delta^{2{k+1}} g := (\dot\delta\hat f)\Delta^{2k}(\hat\delta\dot g) = (\dot\delta^{k+1}\hat f)(\hat\delta^{k+1}\dot g). $$ This $\delta$ is really just another way of writing $\nabla_f\cdot\nabla_g$, and in fact $$ [\nabla_f\cdot\nabla_g]^{2k}fg = f\Delta^{2k}g. $$ This suggests for vector-valued $F, G$ $$\begin{aligned}\ [\nabla_f\cdot\nabla_g]^{2k}F(f)\cdot G(g) &= [\nabla_f\cdot\nabla_g]^{2k}\sum_{i=1}^N F_i(f)G_i(g) = \sum_{i=1}^N F_i(f)\Delta^{2k}G_i(g) \\ &= \sum_{i=1}^N (\dot\delta^k F_i(\hat f))(\hat\delta^k G_i(\dot g)) = [\dot\delta^k F(\hat f)]\cdot[\hat\delta^k G(\dot g)], \end{aligned}$$ so we define the bi-Laplacians of $F, G$ as $$ F\Delta^{2k}G := (\dot\delta^k\hat F)\cdot(\hat\delta^k\dot G). $$ Now the operators $\Delta^{2k+1}$ are well defined; for scalar $f, g$ $$ f\Delta^{2k+1}g = [\nabla_f\cdot\nabla_g]^{2k+1}fg = [\nabla_f\cdot\nabla_g]^{2k}f\Delta g = (\nabla f)\Delta^{2k}(\nabla g), \tag{$**$} $$ recalling ($*$) above. As constructed, we have $$ (\nabla_f\cdot\nabla_g)f\Delta^k g = f\Delta^{k+1}g, $$ hence we also have $$ f\Delta^{2k+2}g = [\nabla_f\cdot\nabla_g]f\Delta^{2k+1}g = [\nabla_f\cdot\nabla_g](\nabla f)\Delta(\nabla g) = (\nabla f)\Delta^{2k+1}(\nabla g). $$ We of course require that $$ f\Delta^0g := fg,\quad F\Delta^0G = F\cdot G. $$

The Gradient Leibniz Formula

It is best here to use the $\nabla = \nabla_f + \nabla_g$ notation. We proceed much like with the Leibniz formula, but we break $\nabla^n$ into Laplacians so that we have commuting scalar operators. Start with the case $n = 2m$; then $$\begin{aligned} \nabla^n &= (\nabla_f + \nabla_g)^n \\ &= (\nabla_f^2 + 2\nabla_f\cdot\nabla_g + \nabla_g^2)^m \\ &= \sum_{i=0}^m{m\choose i}(2\nabla_f\cdot\nabla_g)^{m-i}(\nabla_f^2 + \nabla_g^2)^i \\ &= \sum_{i=0}^m\sum_{j=0}^i{m\choose i}{i\choose j}(2\nabla_f\cdot\nabla_g)^{m-i}(\nabla_f^2)^{i-j}(\nabla_g^2)^j. \end{aligned}$$ The prior discussion about cross-divergences and bi-Laplacians makes it clear that $$\begin{aligned} \nabla^nfg &= \sum_{i=0}^m\sum_{j=0}^i{m\choose i}{i\choose j}2^{m-i}\left[\bigl(\nabla^{2(i-j)}f\bigr)\Delta^{m-i}\bigl(\nabla^{2j}g\bigr)\right] \\ &= \sum_{j=0}^m{m\choose j}\bigl[\nabla^{2(m-j)}f\bigr]\bigl[\nabla^{2j}g\bigr] + n\sum_{j=0}^{m-1}{m-1\choose j}\bigl[\nabla^{2(m-j-1)+1}f\bigr]\cdot\bigl[\nabla^{2j+1}g\bigr] \\ &\qquad+ \sum_{i=0}^{m-2}\sum_{j=0}^i{m\choose i}{i\choose j}2^{m-i}\Bigl[\bigl(\nabla^{2(i-j)+1}f\bigr)\Delta^{m-i-1}\bigl(\nabla^{2j+1}g\bigr)\Bigr]. \end{aligned}$$ When $n = 2m+1$, we simply take the gradient of $\nabla^{2m}fg$. We can write $$\begin{aligned} \nabla^nfg = &\sum_{j=0}^m\Bigl( \bigl[\nabla^{2(m-j)+1}f\bigr]\bigl[\nabla^{2j}g\bigr] + \bigl[\nabla^{2(m-j)}f\bigr]\bigl[\nabla^{2j+1}g\bigr] \Bigr) \\ &+ (n-1)\nabla\sum_{j=0}^{m-1}{m-1\choose j}\bigl[\nabla^{2(m-j-1)+1}f\bigr]\cdot\bigl[\nabla^{2j+1}g\bigr] \\ &+ \nabla\sum_{i=0}^{m-2}\sum_{j=0}^i{m\choose i}{i\choose j}2^{m-i}\Bigl[\bigl(\nabla^{2(i-j)+1}f\bigr)\Delta^{m-i-1}\bigl(\nabla^{2j+1}g\bigr)\Bigr]. \end{aligned}$$

Examples

Now let's test this out. When $n=2$, $$ \nabla^2fg = (\nabla^2f)g + f(\nabla^2 g) + 2(\nabla f)\cdot(\nabla g). $$

When $n = 3$ $$ \nabla^3fg = (\nabla^3f)g + (\nabla f)(\nabla^2g) + (\nabla^2 f)(\nabla g) + f(\nabla^3 g) + 2\nabla(\nabla f)\cdot(\nabla g), $$ which is indeed what we got before.

When $n = 4$, it will help to write out the relevant integers. For the first sum

$j$	${m\choose j}$	$2(m-j)$	$2j$
0	1	4	0
1	2	2	2
2	1	0	4

and for the second sum

$j$	${m-1\choose j}$	$2(m-j-1)+1$	$2j+1$
0	1	3	1
1	1	1	3

and for the third sum

$i$	$j$	${m\choose i}$	${i\choose j}$	$m-i$	$2(i-j)+1$	$2j+1$
0	0	1	1	2	1	1

Hence $$ \nabla^4fg = (\nabla^4f)g + 2(\nabla^2f)(\nabla^2g) + f(\nabla^4g) + 4(\nabla^3f)\cdot(\nabla g) + 4(\nabla f)\cdot(\nabla^3 g) + 4(\nabla f)\Delta(\nabla g). $$

When $n = 6$, the first sum has

$j$	${m\choose j}$	$2(m-j)$	$2j$
0	1	6	0
1	3	4	2
2	3	2	4
3	1	0	6

and the second sum has

$j$	${m-1\choose j}$	$2(m-j-1)+1$	$2j+1$
0	1	5	1
1	2	3	3
2	1	1	5

and the third sum has

$i$	$j$	${m\choose i}$	${i\choose j}$	$m-i$	$2(i-j)+1$	$2j+1$
0	0	1	1	3	1	1
1	0	3	1	2	3	1
1	1	3	1	2	1	3

Hence we get $$ \nabla^6fg = (\nabla^6f)g + 3(\nabla^4f)(\nabla^2g) + 3(\nabla^2f)(\nabla^4)g + f(\nabla^6g) + 6(\nabla^5f)\cdot(\nabla g) + 12(\nabla^3f)\cdot(\nabla^3g) + 6(\nabla f)\cdot(\nabla^5g) + 8(\nabla f)\Delta^2(\nabla g) + 12(\nabla^3f)\Delta(\nabla g) + 12(\nabla f)\Delta(\nabla^3g). $$

Splitting the Gradient

Here is the promised proof that $\nabla = \nabla_f + \nabla_g$ makes sense, or equivalently for any $h : \R^N\times\R^N \to \R$ $$ \nabla h(x, x) = \dot\nabla h(\dot x, x) + \dot\nabla h(x,\dot x). $$ This will be obviously generalizable to any finite $h(x, x, x, \dotsc)$.

For any $F : \R^p \to \R^q$ and any $p, q$, let $DF_x : \R^p \to \R^q$ be the total (Fréchet) derivative at $x \in \R^p$. Then for any $F_1, F_2 : \R^N \to \R^N$, we consider a function $h : \R^N\oplus \R^N \to \R$. We put the obvious inner product/norm on $\R^N\oplus\R^N$. For any vector $v \in \R^N$ $$\begin{aligned} v\cdot\nabla h(F_1(x)\oplus F_2(x)) &= D[h(F_1\oplus F_2)]_x(v) \\ &= Dh_{F_1(x)\oplus F_2(x)}\circ D[F_1\oplus F_2]_x(v) \\ &= (\partial h)\cdot\bigl(D[F_1\oplus F_2](v)\bigr) \\ &= (\partial h)\cdot\bigl(D[F_1](v)\oplus D[F_2](v)\bigr) \\ &= (\partial h)\cdot D[F_1](v) + (\partial h)\cdot D[F_2](v) \\ &= (\dot\nabla h(\dot x, x))\cdot D[F_1](v) + (\dot\nabla h(x,\dot x))\cdot D[F_2](v). \end{aligned}$$ In the third line, we've made the points of differentiation implicit and used the gradient $\partial h$ of $h$. Choosing $F_1, F_2$ to both be the identity, we see $$ v\cdot\nabla h(x\oplus x) = v\cdot\dot\nabla h(\dot x\oplus x) + v\cdot\dot\nabla h(x\oplus\dot x), $$ hence $$ \nabla h(x\oplus x) = \dot\nabla h(\dot x\oplus x) + \dot\nabla h(x\oplus \dot x). $$ The use of $\R^N\oplus\R^N$ was simply an illustrative tool; the same result holds for $h : \R^N\times\R^N \to \R$.

Thanks for this very long, very informative answer! – WillG Aug 25 '22 at 20:31 — WillG, Aug 25 '22 at 20:31