When I learned binomial theorem, I was first taught pascals triangle, then that the powers of terms add up to the degree etc, however there seems to be no intuitive explanation for multinomial theorem. Having it put simply let me understand the summation formula for binomial expansion, however I can only find notation-heavy explanations online. Can anyone explain this in a simple way in words?
-
Pascal's Pyramid, and then Pascal's Hyper-pyramid, work for $n=3$ and $n=4.$ – Adrian Keister Jan 06 '20 at 17:28
-
When I read the title of the question I suspected you wanted an intuitive proof of the theorem (which exists by a quick combinatorial argument). However, Pascal's Triangle doesn't really explain why the binomial theorem holds and so I don't see how that explains the binomial theorem for you. – Qi Zhu Jan 06 '20 at 17:32
-
@QiZhu I think salman khan has a video where he talks about each term on the triangle being the number of paths from the top of the pyramid to that term. Then for example in (a+b)^2, there's one way to get a^2, two to get ab, one to get b^2, hence 1 2 1. – jamie Jan 06 '20 at 17:36
-
https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:polynomials/x9e81a4f98389efdf:binomial/v/pascals-triangle-binomial-theorem – jamie Jan 06 '20 at 17:37
-
An analogous proof works for the multinomial theorem. Count the number of ways in which a monomial can be produced while multiplying. – Qi Zhu Jan 06 '20 at 18:02
-
The answer by @Milo Brandt is nice, and I just upvoted it. For a mainly computational approach (not as much why the factorials and such show up in the way they do, but describing how to efficiently carry out the computation for special cases by hand), see my answer to How to expand $(a_0+a_1x+a_2x^2+...a_nx^n)^2$? For a nice picture for $n=3$ (Pascal's Pyramid), see scrblnrd3's answer to General expanded form of $(x+y+z)^k$. – Dave L. Renfro Feb 19 '20 at 08:56
3 Answers
The best way is to connect multinomial coefficients to a certain counting problem - and this can be done very naturally. Note that if we want to calculate an expression such as $$(x_1+x_2+\ldots+x_k)^n$$ we could really just imagine writing $n$ copies of this term side-by-side and then distributing everything we could possibly distribute. For instance, suppose we wanted to calculate $$\newcommand{\x}{{\color{red} x}}\newcommand{\y}{{\color{blue} y}}\newcommand{\z}{{\color{green} z}}(\x+\y+\z)(\x+\y+\z)(\x+\y+\z)$$ where I've now colored the terms for a reason we will soon see. When you distribute, what you are really doing is taking a term from each of the three sums and multiply those terms together - and then summing that up over all possible combinations of three terms. We could, of course, just write out every single possible sequence of three terms from $\{\x,\y,\z\}$ and we would get a correct expression for $(x+y+z)^3$: \begin{align*}&\x\x\x+\x\x\y+\x\x\z+\x\y\x+\x\y\y+\x\y\z+\x\z\x+\x\z\y+\x\z\z\\ +&\y\x\x+\y\x\y+\y\x\z+\y\y\x+\y\y\y+\y\y\z+\y\z\x+\y\z\y+\y\z\z\\ +&\z\x\x+\z\x\y+\z\x\z+\z\y\x+\z\y\y+\z\y\z+\z\z\x+\z\z\y+\z\z\z\end{align*} However, this is not a very efficient way, because we see that some terms are listed multiple times! For instance $\x^2\y =\x\x\y = \x\y\x = \y\x\x$ is listed three times - and $\x\y\z$ is listed six times!
So, the question becomes: how many times is $\x^a\y^b\z^c$ listed in the sum resulting from distributing $(\x+\y+\z)^n$? Well, that's the number of ways we can arrange a string of length $n$ from $a$ copies of $\x$ and $b$ copies of $\y$ and $c$ copies of $\z$. Otherwise said: it's the number of ways to color a set of $n$ distinct elements with three colors, specifying how many are to be red, green, and blue.
How might we calculate that quantity? Well, one approach is to simply define the multinomial coefficient to calculate that. A more useful approach is to think of a procedure for generating all such colorings. As an example to generalize from, suppose we wished to calculate how many ways we could arrange four terms, so that two of them were $\x$ and one each was $\y$ and $\z$. We could imagine that we start with an empty string consisting of four empty spaces, which we'll refer to as positions one through four:$$\cdot\cdot\cdot\,\cdot$$ We know that we first need to fill two of the positions with red $\x$'s, so we'll choose an empty position and put an $\x$ in it. There are $4$ ways to do this. $$\cdot\cdot\x\,\cdot$$ Now we need another red $\x$ somewhere. There are $3$ places to put it - so let's choose one. $$\x\cdot \x\,\cdot$$ Next, we want to put a blue $\y$ somewhere and we have $2$ choices $$\x\y \x\,\cdot$$ Then, finally, in the remaining space, we must put a green $\z$ $$\x\y \x\z$$ Essentially, our process is that we pick an exhaustive sequence of positions and greedily fill them by the first color that we don't yet have enough of. There are $4!$ choices total in this process, but some lead to the same solution - for instance, we could have started by putting an $\x$ in the first position and then put one in the third position. In general, there are two ways to reach any given sequence of two $\x$'s, one $\y$ and one $\z$, since we can choose in which order to place the $\x$'s - hence there will be $\frac{4!}2 = 12$ sequences with the given number of each symbol.
More broadly, if we wanted to do this process to generate $a$ red symbols, $b$ blue symbols and $c$ green symbols, we would have to (according to the process) place the red ones first, then the blue ones, then the green ones, but it wouldn't matter in which order we placed symbols within each group - hence each sequence with the desired counts of symbols would be generated by $a!b!c!$ processes of choosing one symbol at a time. If we have $n=a+b+c$, then there are $n!$ ways to pick one element at a time, this gives a total of $\frac{n!}{a!b!c!}={n\choose a,b,c}$ sequences with the desired outcome. But remember! We're really counting the number of terms in the expansion of $(x+y+z)^{n}$ that reduce to $\x^a\y^b\z^c$ - so this would exactly be the coefficient of $\x^a\y^b\z^c$ in the expansion of $(x+y+z)^n$.
From this, generalizing to sums of arbitrarily many variables is simply a matter of adding more colors - and all the reasoning works out likewise to show that $$(x_1+x_2+\ldots+x_k)^n = \sum_{\substack{a_1,a_2,\ldots,a_k\\a_1+a_2+\ldots+a_k=n}}{n \choose a_1,a_2,\ldots,a_k}x_1^{a_1}x_2^{a_2}\ldots x_k^{a_k}$$ where ${n \choose a_1,a_2,\ldots,a_k} = \frac{n!}{a_1!a_2!\ldots a_k!}$.
Note: this argument can be made rigorous in a fairly straightforward manner, but it very quickly runs into notational difficulty which would obscure the intuition (although it's not as bad as trying to put together an inductive argument, as students are sometimes asked to do!). The important lemma here is that if $I_1,\ldots,I_k$ are finite sets used as indices to a sum and for each $i\in I_j$ there is a value $v_i$, we have $$\prod_{j=1}^n\sum_{i\in I_j}v_i=\sum_{(i_1,\ldots,i_k)\in I_1\times \ldots \times I_k}\prod_{j=1}^n v_{i_j}$$ which is a rather opaque equation to come across if not explained well! What it says is that taking a product over a sum is the same as summing up all the possible products of terms from the sums.
The rest of the argument is then looking at the set $I_1\times \ldots \times I_k$ which represents a choice of term for each sum in the product, and dividing it up based on the value of the inner product $\prod_{j=1}^n v_{i_j}$. The most literal translation of the argument above is that, since we can regard $I_1=I_2=\ldots=I_k$ since all the sums in the product are the same, we can, given the counts of appearances of each $i\in I_1$ we want in the tuple $(i_1,\ldots,i_k)$, essentially construct a function which takes a permutation of the indices $(1,\ldots,k)$ to the subset of $I_1\times \ldots \times I_k$ having the appropriate counts, and then calculate how many permutations map to each such tuple. It's easy to see how a proof involving so many indices could easily turn into an unreadable mess - and how it might involve a fair breadth of somewhat distance concepts to make things worse - especially if the author doesn't wish to use ellipses as I have done in this sketch.

- 60,888
One thing that can be said about the multinomial theorem is that to understand it you have to be at ease when using/analyzing expressions written in the Capital-sigma notation and Capital pi notation.
For fixed positive integer $m$ and $n$ consider the expression
$\tag 1 \displaystyle{\bigg( \sum _{j=1}^m\,x_j \bigg)^n}$
If you like algebra you'll quickly see that you can regard $\text{(1)}$ as a homogeneous polynomial
of degree $n$.
OK, without committing to too much, it is agreed that we can write
$\tag 2 {\displaystyle \bigg( \sum _{j=1}^m\,x_j \bigg)^n=\sum _{k_{1}+k_{2}+\cdots +k_{m}=n} f(k_{1},k_{2},\ldots ,k_{m})\prod _{t=1}^{m}x_{t}^{k_{t}}\,}$
where $f: S \to \Bbb Z^{+}$ and $S = \{ \vec k \in \Bbb N^m \mid k_{1}+k_{2}+\cdots +k_{m}=n\}$.
All that remains is to understand why
$\tag 3 f(k_{1},k_{2},\ldots ,k_{m}) = \displaystyle {n \choose k_{1},k_{2},\ldots ,k_{m}}$
The explanation? The $\text{lhs}$ of $\text{(2)}$ is all about functions over it selecting from a choice of $m$ addends from each of the $n$ multiplicands and then $f$ collects and simplifies the 'mess' using the rules of algebra.
To form each term $f(k_{1},k_{2},\ldots ,k_{m})\prod _{t=1}^{m}x_{t}^{k_{t}}\,$ on the $\text{rhs}$ think of (or imagine) a set containing $n$ different $1$ coefficients and begin by selecting your first block of $k_1$ to give the $x_1^{k_1}$ piece. There are $n - k_1$ elements left and you 'pluck out' $k_2$ of them for the next piece. When you are done you have the built up the $\prod _{t=1}^{m}x_{t}^{k_{t}}\,$ term as well as the number ways you can do it, our multinomial coefficient.

- 11,366
I don't know if this is the simple way of understanding the multinomial formula. But here an algebric proof of it.
1-
Let's note $D(n;r)$ all the possible partition of $\left \{ 1;...;n \right \}$ indexes by $\left \{ 1;...;r \right \}$. That means all the possible partition of $\left \{ 1;...;n \right \}$ into $r$ subsets.
We can note that $D(n;r)= $ { The set of all partitions in $r$ sub sets such that the first sub set has $i_1$=1 element, the segond sub set has $i_2$=3 elements, ... the $l$-th subset has $i_l$ elements... such that of course $i_1+i_2+...+i_l+...=n$ } $\cup$ { The set of all partitions in $r$ sub sets such that the first sub set has $i_1$=4 element, the segond sub set has $i_2$=1 elements, ... the $l$-th subset has $i_l$ elements... such that of course $i_1+i_2+...+i_l+...=n$ } $\cup ...$
More generally we writte $D(n; i_1, i_2, ..., i_r)$ a possible partition of $\left \{ 1;...;n \right \}$ into $r$ sub set s.t. the first sub set has $i_1$ elements, the second $i_2$...and of course $i_1+i_2+...+i_l+...=n$.
Thus we get $D(n;r)=\bigcup_{all \, possibles \, (i_1;i_2;...;i_r) \in \mathbb{N} \, s.t. i_1+...+i_r=n }^{} D(n; i_1, ..., i_r)$. In other words we redescribe $D(n;r)$ as an union of smaller sub set $D(n; i_1, ..., i_r)$.
2-
Now let focus on $D(n; i_1, ..., i_r)$. From here we know that there is exactly $|D(n;i_1,...,i_r)| = \frac{n!}{i_1! i_2! ... i_n!}$ elements in each $D(n;i_1,...,i_r)$.
3-
By definition $(x_1+x_2+...+x_r)^n=$ the sum of all possibles multiplications that can be made by multipliying $n$ elements chosen (with or without repetition) from $ \left \{ x_1;x_2;...;x_r \right \} = $ sum on each of $D(n;r)$ partition possible where the power of an $ x_i \in \left \{ x_1;x_2;...;x_r \right \}$ is given by the number of elements in his correspondant sub set in the partition (maybe equal to $0$). Indeed the fact that we choose $n$ elements from $ \left \{ x_1;x_2;...;x_r \right \} $ caused that the total power on each element of this big sum, "sum up/is equal" to $n$.
But we gave in "-1" a complete disjoint descritption of $D(n;r)$ hence.
$(x_1+x_2+...+x_r)^n=\sum_{\pi \in D(n;r)}^{}x_1^{|\pi (index \, 1)|} \cdot ... \cdot x_r^{|\pi (index \, r)|}= \sum_{(i_1,...,i_n) \,s.t. \,i_1+...+i_n=n}^{}\binom{n}{i_1,i_2,...,i_r} x_1^{i_1} \cdot x_2^{i_2} \cdot ... \cdot x_r^{i_r}$
Where $ \pi $ is a specific partition of $D(n;r)$ and where, for exemple, $|\pi (index \, 1)|$ gives the number of element in the first intervall $I_1$ corresponding to $x_i$ in the given partition $\pi$. See here for more details with exemple.
Note that $|\pi ( index \, 1)| + ... + |\pi ( index \, r)| = n$
Q.E.D.

- 708