How do I compute binomial coefficients efficiently?

Question

I'm trying to reproduce Excel's COMBIN function in C#. The number of combinations is as follows, where number = n and number_chosen = k:

$${n \choose k} = \frac{n!}{k! (n-k)!}.$$

I can't use this formula because the factorial overflows the computer's capacity really quick. Int32 can only do up to 12!, Int64 up to 20!, and double up to a whopping 170! after that I'm on my own.

How can I get the same results with a formula more gentle to the computer?

Also, I hope you don't mind that I edited the title. "Group" has a different and precise meaning in mathematics. — Qiaochu Yuan, Sep 26 '12 at 01:56
Are you looking for an exact answer or a floating point approximation? — copper.hat, Sep 26 '12 at 02:08
Not quite what the OP is after, but certainly an interesting read for anyone interested in an efficient way of evaluating binomial coefficient for large arguments. — Sasha, Sep 26 '12 at 02:50
@Sasha - You're onto something here. What does Xmath.Product(primes, 0, fi) do? How can I implement it? — Manuel, Sep 26 '12 at 03:16
If using .Net 4.0 or greater you can use BigInteger: http://msdn.microsoft.com/en-us/library/system.numerics.biginteger(v=vs.110).aspx — leonbloy, Jan 05 '14 at 03:27
i would agree with @RossMilikan answer, plus use logarithms and compute the result iteratively adding and substracting as needed (in order) to prevent overflows — Nikos M., Mar 04 '15 at 21:05

score 27 · Answer 1 · edited Jun 13 '18 at 15:26

André Nicolas correctly identifies that we should maybe use: $$\binom{n}{k}=\frac{n}{k}\binom{n-1}{k-1}$$

But we should also use:

$$\binom{n}{k}=\binom{n}{n-k}$$

And this also seems important:

$$\binom{n}{0} = \frac{n!}{(n-0)!} = \frac{n!}{n!} = 1$$

Great, we have everything we need to implement the function recursively!

I like to implement n choose k in a functional language like so:

n `choose` k
  | k > n           = undefined
  | k == 0          = 1
  | k > (n `div` 2) = n `choose` (n-k)
  | otherwise       = n * ((n-1) `choose` (k-1)) `div` k

Or in an imperative language like so:

f(n, k) {
  if(k >  n)
    throw some_exception;
  if(k == 0)
    return 1;
  if(k > n/2)
    return f(n,n-k);
  return n * f(n-1,k-1) / k;
}

It is pretty easy to see that both implementations run in $O(k)$ time and avoids overflow errors of fixed precision numbers much better then calculating n!/(n-k)!.

I think it would make more sense for your imperative language to not use recursion. Imperative languages typically can't handle deep recursion. Plus, the function is not tail-recursive so even in a functional language there's a memory overhead associated with the recursive calls. — Dici, Dec 01 '18 at 18:32
There is a tail-recursive implementation on StackOverflow at https://stackoverflow.com/a/65362753/29771. — Glenn, Dec 21 '20 at 16:48

score 17 · Accepted Answer · answered Sep 26 '12 at 02:00

17

Maybe use $$\binom{n}{k}=\frac{n}{k}\binom{n-1}{k-1}.$$

answered Sep 26 '12 at 02:00

André Nicolas

507,029

2

You're going to have to dumb it down a little bit. I don't know how to read it and translate it into code. – Manuel Sep 26 '12 at 03:20
The symbol $\binom{n}{k}$ is the binomial coefficient. This gives a recurrence. You can either write the code more or less directly as is, or work upwards. I am assuming here that you are using real arithmetic rather than integer arithmetic. For integer arithmetic, one can use $\binom{n}{k}=\binom{n-1}{k}+\binom{n-1}{k-1}$, but it is a lot slower. – André Nicolas Sep 26 '12 at 04:32
5

@AndréNicolas You can still multiply by $n/k$ in integer arithmetic: just multiply by $n$ first. If the number is large enough to overflow, one can reduce $n/k$ to lowest terms and then do the division first. – Erick Wong Sep 26 '12 at 17:13

score 9 · Answer 3 · answered Sep 26 '12 at 02:01

9

I presume you are looking for a floating point answer here.

$\prod_{i=0}^{k-1} \frac{n-i}{k-i} $. Compute each term $\frac{n-i}{k-i}$ and multiply.

answered Sep 26 '12 at 02:01

copper.hat

172,524

Why the downvote? Something change in the last 9 years??? – copper.hat Mar 12 '22 at 07:26
This is the same as the accepted answer, why pick me? – copper.hat Mar 12 '22 at 07:27
(I didn't downvote.) Floating-point multiplication is not associative. Would it be better to multiply the larger numbers first, or the smaller numbers first? – mr_e_man Mar 22 '22 at 00:08
@mr_e_man I don't know. I don't think it makes a huge difference as it is all multiplication & division. The simple floating point model suggests that the relative error is of the order $(1+\epsilon)^k$. – copper.hat Mar 22 '22 at 00:42

score 5 · Answer 4 · answered Sep 26 '12 at 02:03

5

Note that $$ \frac{n!}{k!(n-k)!} = \frac{n(n-1)(n-2)\cdots (n-k+1)}{k!} $$ When doing this by hand, there is much cancellation that can be done. In fact, since $\binom{n}{k}$ is an integer, we are guaranteed to be able to cancel all of $k!$.

By they way, since $\binom{n}{k} = \binom{n}{n-k}$, we can always set up the fraction so that there are no more than $n/2$ factors on top and bottom.

Hope this helps!

answered Sep 26 '12 at 02:03

Shaun Ault

8,871

I'm missing something here. COMBIN(5,2) = 10, yet your formula yields 30. – Manuel Sep 26 '12 at 02:52
1

@Manuel, In COMBIN(5,2), $n=2$ and $k = 2$. So we have $\frac{5 \cdot 4}{2 \cdot 1} = 10$. – Shaun Ault Sep 26 '12 at 11:17
2

For futur readers, in @ShaunAult 's previous comment , he meant n = 5 . – soufiane yakoubi Feb 04 '20 at 20:00

score 5 · Answer 5 · answered Sep 26 '12 at 02:05

5

You can use Stirling's approximation to calculate the logarithm. $\ln n!\approx n \ln n - n + \frac 12 \ln (2 \pi n)$. It is quite accurate as $n$ gets large (even $10$)

answered Sep 26 '12 at 02:05

Ross Millikan

374,822

score 2 · Answer 6 · answered May 21 '18 at 11:21

From programmatic aspect, use log transformed version can be very efficient. As I often need p-value derived from cumulative hypergeometric or binomial distribution for quantitatively characterizing the problem at hand, negative 10-based log transformation is required for reserving very very small p-value (e.g., $10^{-20}$) in practice. 10-based log transformed binomial coefficients $C$ can be calculated as: $$log_{10}C=\sum_{i=1}^n log_{10}(i) -\sum_{i=1}^klog_{10}(i)-\sum_{i=1}^{n-k}log_{10}(i)$$ So in programming, simply generate an array S with elements ranging from $1$ to $n$, and $log_{10}$ transform each element. The $log_{10}$ transformed coefficient can be calculated simply by summing the whole array, then subtracting the sum of arrays ranging from $1$ to $k$ and $1$ to $n-k$ from it, i.e., in MATLAB, logc=sum(S)-sum(S(1:k))-sum(S(1:n-k)), and in Python, logc=sum(S)-sum(S[:k])-sum(S[:n-k]). Only one array is needed.

For $k=0$ or $k=n$, set $log_{10}C=0$.

Also when I think about it you can even improve on this, as sum(S)-sum(S[:n-k]) == sum(S[n-k:]), so your last line could just be logc = sum(S[n-k:]) - sum(S[:k]) — Oded R., Jan 09 '21 at 16:04

score 1 · Answer 7 · edited Jan 05 '14 at 02:54

I have written a class to handle common functions for working with the binomial coefficient. It performs the following tasks:

Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.
Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.
Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it might be faster than the link you have found.
Uses Lilavati method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.
The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.
There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

score 1 · Answer 8 · answered May 28 '14 at 17:05

The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.

binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5

do while col < n
    binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
    output(binomial_coefficient)
    col = col + 1
loop

The output of binomial coefficients is therefore:

1
1 *  (5 + 1 - (0 + 1)) / (0 + 1) = 5 
5 *  (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15 
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5 
5 *  (5 + 1 - (4 + 1)) / (4 + 1) = 1

I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(

Felix Marin, why would there be memory problem? This is not recursive. — Magnus, Feb 12 '22 at 01:35

How do I compute binomial coefficients efficiently?

8 Answers8

Great, we have everything we need to implement the function recursively!

Linked