Why does $n$-time differentiation of product have the same structure as raising sum to $n$th power?

Question

A formula for differentiating a product is well known:

$$(ab)'=a'b+ab'.$$

At first sight it doesn't resemble anything interesting. But what if we differentiate twice? We'll get

$$(ab)''=a''b+2a'b'+ab''.$$

Now this does resemble something:

$$(a+b)^2=a^2b^0+2a^1b^1+a^0b^2.$$

Of course, going to the next level now gives us the expected result:

$$(ab)'''=a'''b+3a''b'+3a'b''+ab''',$$

in full accordance to known binomial expansion:

$$(a+b)^3=a^3b^0+3a^2b^1+3a^1b^2+a^0b^3.$$

I know that binomial expansion stems from distributive law of multiplication over addition. It's easy because integer powers can be interpreted as iterated multiplication. But I can't seem to see any simiar distributive law of differentiation over product.

Observing this, I wonder: how to explain this striking similarity between seemingly unrelated things? Can this be explained by some generalized form of distributive law?

Computing the $n$th derivative is applying the the $n$th power of the differential operator — Hagen von Eitzen, Oct 29 '14 at 19:15
@HagenvonEitzen yeah, I tried to think in this direction, but I still couldn't see how it really is related: we don't have a binomial in differential operator... or do you mean the finite difference it originates from?.. Still, it's a difference, not a sum... — Ruslan, Oct 29 '14 at 19:17
Perhaps the Taylor series might be more illuminating? (Just throwing around ideas.) — Akiva Weinberger, Oct 29 '14 at 19:21
Highly related (duplicate?) http://math.stackexchange.com/questions/135510/how-is-leibnizs-rule-for-the-derivative-of-a-product-related-to-the-binomial-fo — PhoemueX, Oct 29 '14 at 21:54

score 9 · Accepted Answer · answered Oct 31 '14 at 23:39

This can be easily shown using the bialgebras.

First, introduce a something called tensor product, denoted $\otimes$, which has property \begin{equation} (a\otimes b)(c\otimes d)=ac\otimes bd. \end{equation}

Then, introduce something called multiplication map, denoted by $m$, such that, for example: \begin{equation} m(a\otimes b + c\otimes d+...) = ab+cd+... \end{equation} i.e. $m$ turns each tensor product into normal product.

And let's denote derivative operator with $\partial$.

Then, Leibnitz rule "simply" states: \begin{equation} \partial(ab)=m[(\partial\otimes1+1\otimes\partial)(a\otimes b)] \end{equation} It's easy to see that this must hold. Left hand side is simply $(ab)'$. Right hand side is: \begin{align} m[(\partial\otimes1+1\otimes\partial)(a\otimes b)] &=m(\partial a \otimes b + a \otimes \partial b)\\ &=m(a' \otimes b + a \otimes b')\\ &=a'b+ab' \end{align}

So far so good! Oh and btw. $\Delta\partial\equiv\partial\otimes1+1\otimes\partial$ is something called coproduct. It has a property that $(\Delta h)^n=\Delta h^n$, which leads to the property $f(\Delta h)=\Delta f(h)$ and it's used to see how does some operator act on a product, e.g.:

\begin{equation} \partial(ab)=m[\Delta\partial(a\otimes b)] \end{equation}

Now, we want to act on the product with derivative operator $n$ times: \begin{equation} (ab)^{(n)}=\partial^n(ab) =m[\Delta\partial^n(a\otimes b)] =m[(\Delta\partial)^n(a\otimes b)] \end{equation}

So we only need to calculate $(\Delta\partial)^n$. But this is simply a binomial raised to the $n$-th power: \begin{equation} (\Delta\partial)^n=(\partial\otimes1+1\otimes\partial)^n \end{equation}

And this is where all those binomial coefficients come from.

score 2 · Answer 2 · answered Oct 31 '14 at 23:55

2

Suppose that we know that $(fg)^{(n)}$ is some linear combination of products of various derivatives, and we want to find the coefficients. Take $f(t) = e^{\alpha t}$ and $g(t) = e^{\beta t}$. Then $$ (fg)^{(n)} = (e^{(\alpha+\beta)t})^{(n)} = (\alpha + \beta)^n e^{(\alpha+\beta)t}. $$ On the other hand, $$ f^{(i)} g^{(j)} = \alpha^i \beta^j e^{(\alpha+\beta)t}. $$ Comparing coefficients, the binomial theorem directly implies the expression $$ (fg)^{(n)} = \sum_{k=0}^n \binom{n}{k} f^{(k)} g^{(n-k)}. $$

It is easy to prove by induction that $(fg)^{(n)}$ is indeed some linear combination of products of various derivatives, and in fact the same is true for an expression like $(fgh)^{(n)}$. The same reasoning now directly implies that $$ (fgh)^{(n)} = \sum_{i+j+k = n} \frac{n!}{i!j!k!} f^{(i)} g^{(j)} h^{(k)}, $$ and similar formulas work for even more variables.

answered Oct 31 '14 at 23:55

Yuval Filmus

57,157

Bill Dubuque actually has the same answer (independently) to http://math.stackexchange.com/questions/135510/how-is-leibnizs-rule-for-the-derivative-of-a-product-related-to-the-binomial-fo, so this must be a folklore argument. – Yuval Filmus Oct 31 '14 at 23:56
This is good for exponentials, but doesn't seem to nicely generalize to general Fourier-represented functions... – Ruslan Nov 01 '14 at 07:12
It's a formal argument. We already know that $(fg)^{(n)}$ is some linear combination of $f^{(i)}g^{(j)}$, and we use this argument just to find the coefficients. – Yuval Filmus Nov 01 '14 at 07:17

Milo Brandt · Answer 3 · 2014-11-01T00:07:30.580

I hope this site is ready for some funky notation, because here it comes (but maybe not immediately):

Firstly, let's denote $a'$ as $Da$ instead and $a''$ as $D^2 a$ and so on, where $D$ is the map taking $f$ to its derivative. Then, the expression $$Da\,b+a\,Db$$ looks to me like the sum of two expressions, acting on the pair $(a,b)$ - the first expression, $Da\,b$ takes the derivative of the first and multiplies it by the second. The second expression does that, in reverse, more or less.

To formalize this a bit (this is where the funky notation comes - the curly braces do not refer to sets), let's denote that first expression - where we map $(a,b)$ to $Da\,b$ - as $\{1,D\}$ and the second expression as $\{D,1\}$. Then, we are basically saying that the derivative of $ab$ equals the application of $$\{1,D\}+\{D,1\}$$ to $(a,b)$.

However, a notable property of this notation is that if we could write $ab$ as the application of $\{1,1\}$ to $(a,b)$ - that is, we multiply $a$ and $b$ by $1$, then multiply them together. So, we're basically saying that, in a derivative, we have $$D(\{1,1\}\cdot (a,b))=(\{1,D\}+\{D,1\})\cdot(a,b)$$ where $\{x,y\}\cdot (a,b)$ is the application of $\{x,y\}$ to $(a,b)$. However, if we apply this multiple times we get that $$D^2(\{1,1\}\cdot (a,b))=D((\{1,D\}+\{D,1\})\cdot(a,b))$$ which, since the derivative is linear, it should, in some sense, commute with the curly brace operators, which would give $$D^2(\{1,1\}\cdot (a,b))=(\{1,D\}+\{D,1\})D(\{1,1\}\cdot(a,b))$$ $$D^2(\{1,1\}\cdot (a,b))=(\{1,D\}+\{D,1\})^2\cdot(a,b)$$ $$D^2(\{1,1\}\cdot (a,b))=(\{1,D^2\}+2\{D,D\}+\{D^2,1\})\cdot(a,b).$$ where we'd expand the right side as $ab''+2a'b'+2a''b$. (Note that we're taking the product of curly-brace operators to be $\{x_1,y_1\}\{x_2,y_2\}=\{x_1x_2,y_1y_2\}$)

More generally, we will get $$D^n(\{1,1\}\cdot (a,b))=(\{1,D\}+\{D,1\})^n\cdot(a,b)$$ where we clearly have an exponent over a sum, which yields the relation. All of the above could be formalized, if so desired, but given that the question seems to ask for intuition, that's likely unnecessary.

(Though, ultimately, this is basically the same as Danijel's answer, which was posted before I finished this, just expressed in different terms)

score 2 · Answer 4 · answered Oct 29 '14 at 19:44

Well, there are many places in Calculus or Algebra that this similarity appears. I don't understood exactly what you mean't by "explanation of this similarity", but this similarity can be easily proven by induction over n (n being the number of differentiations).

We want to prove that $(ab)^{(n)} = \sum^{n}_{i=0}\binom{n}{i}a^{(n-i)}b^{(i)}$.

Well, it works for $n=2$. Now let's suppose it works for $n$ and prove that it works for $n+1$. Just take the derivative of $(ab)^{(n)}$.

$(ab)^{(n+1)} = \sum^{n}_{i=0}\binom{n}{i}(a^{(n-i)}b^{(i)})' = \sum^{n}_{i=0}\binom{n}{i}(a^{(n+1-i)}b^{(i)} + a^{(n-i)}b^{(i+1)} )$

$(ab)^{(n+1)} = \sum^{n}_{i=0}\binom{n}{i}a^{(n+1-i)}b^{(i)} + \sum^{n}_{i=0}\binom{n}{i}a^{(n-i)}b^{(i+1)}$

$(ab)^{(n+1)} = a^{(n+1)}b + \sum^{n}_{i=1}\binom{n}{i}a^{(n+1-i)}b^{(i)} + \sum^{n+1}_{i=1}\binom{n}{i-1}a^{(n+1-i)}b^{(i)} $

$(ab)^{(n+1)} = a^{(n+1)}b + \sum^{n}_{i=1}\binom{n}{i}a^{(n+1-i)}b^{(i)} + \sum^{n}_{i=1}\binom{n}{i-1}a^{(n+1-i)}b^{(i)} + ab^{(n+1)}$

Here, you apply that rule that says that $\binom{n}{i}+\binom{n}{i-1}=\binom{n+1}{i} $. Thus, giving you that:

$(ab)^{(n+1)} = a^{(n+1)}b + \sum^{n}_{i=1}\binom{n+1}{i}a^{(n+1-i)}b^{(i)} + ab^{(n+1)}$

$(ab)^{(n+1)} = \sum^{n+1}_{i=0}\binom{n+1}{i}a^{(n+1-i)}b^{(i)} $

Which is exactly what we wanted to prove.

score 2 · Answer 5 · answered Oct 29 '14 at 21:57

The desired "distributive law of differentiation over product" is exactly the law $(ab)' = a'b + ab'$. What it says is that to differentiate a product, you choose which factor to differentiate, $a$ or $b$, and you get one term for each choice. So if you differentiate $n$ times (without gathering like terms), you'll get $2^n$ terms, one for each sequence of choices; when you gather like terms, you group them by the number of times you chose to differentiate the first factor, call that $k$, and there are $\binom nk$ ways you could have done that.

Analogously, when multiplying something by $(a+b)$, like $(a+b)c = ac+bc$, you choose which term of the binomial to multiply $c$ by, $a$ or $b$, and you get one term for each choice. So if you multiply by $(a+b)^n$, you get $2^n$ terms, and so on.

To make this analogy formal, you could consider $M$, the free $\mathbb Z$-module over $\mathbb N^2$, that is, the set of formal sums of pairs $(p,q)$ of natural numbers, define the operators $L(p,q) = (p+1,q)$ and $R(p,q)=(p,q+1)$, define $L+R$ pointwise as usual, and then (noting that $L$ and $R$ commute) compute $$ (L+R)^n(0,0) = \sum_{k=0}^n \binom nk L^k R^{n-k} (0,0) = \sum_{k=0}^n \binom nk (k,n-k) \tag{$\ast$} $$ This module $M$ is isomorphic to the module of polynomials in $a$ and $b$, via $(p,q)\mapsto a^p b^q$; $L$ corresponds to multiplication by $a$, $R$ to multiplication by $b$, and ($\ast$) corresponds to the binomial theorem. $M$ is also isomorphic to the submodule of $C^\infty(\mathbb R)$ generated by functions of the form $a^{(p)} b^{(q)}$ (here $a$ and $b$ are fixed elements of $C^\infty(\mathbb R)$), and ($\ast$) corresponds to the differentiation behaviour you've noted.

From this point of view, the root cause is that multiplication is just a special case of composition of operators, and it's really composition of operators that distributes over pointwise addition, yielding the binomial theorem when operators commute.

score 0 · Answer 6 · edited Oct 31 '14 at 22:44

0

Assuming that the $n$th derivative of $ab$ has terms like $$\binom nk a^{(k)}b^{(n-k)}$$ the coefficient of $a^{(k+1)}b^{(n-k)}$ (which appears in the $n+1$-th derivative) is $$\binom {n}{k}+\binom{n}{k+1}=\binom{n+1}{k+1}$$

which is obtained derivating the terms $$\binom nk a^{(k)}b^{(n-k)}+\binom n{k+1}a^{(k+1)}b^{(n-k-1)}$$

The similarity arises when you multiply $$\left(\binom nk a^kb^{n-k}+\binom n{k+1}a^{k+1}b^{n-k-1}\right)(a+b)$$

Not sure if this answers your question, though.

edited Oct 31 '14 at 22:44

Jonas Meyer

53,602

answered Oct 29 '14 at 19:25

ajotatxe

65,084

Is the lack of right parens intentional? – Semiclassical Oct 29 '14 at 19:35
The notation for the $n$th derivative of a function $f$ that I've most often seen is $f^{(n}$. Fell free to edit if this notation is not standard. – ajotatxe Oct 29 '14 at 19:37
1

Ah. I've usually seen it as $f^{(n)}$. – Semiclassical Oct 29 '14 at 19:38

Why does $n$-time differentiation of product have the same structure as raising sum to $n$th power?

6 Answers6

Linked