24

I'm trying to reproduce Excel's COMBIN function in C#. The number of combinations is as follows, where number = n and number_chosen = k:

$${n \choose k} = \frac{n!}{k! (n-k)!}.$$

I can't use this formula because the factorial overflows the computer's capacity really quick. Int32 can only do up to 12!, Int64 up to 20!, and double up to a whopping 170! after that I'm on my own.

How can I get the same results with a formula more gentle to the computer?

Manuel
  • 495

8 Answers8

27

André Nicolas correctly identifies that we should maybe use: $$\binom{n}{k}=\frac{n}{k}\binom{n-1}{k-1}$$

But we should also use:

$$\binom{n}{k}=\binom{n}{n-k}$$

And this also seems important:

$$\binom{n}{0} = \frac{n!}{(n-0)!} = \frac{n!}{n!} = 1$$

Great, we have everything we need to implement the function recursively!


I like to implement n choose k in a functional language like so:

n `choose` k
  | k > n           = undefined
  | k == 0          = 1
  | k > (n `div` 2) = n `choose` (n-k)
  | otherwise       = n * ((n-1) `choose` (k-1)) `div` k

Or in an imperative language like so:

f(n, k) {
  if(k >  n)
    throw some_exception;
  if(k == 0)
    return 1;
  if(k > n/2)
    return f(n,n-k);
  return n * f(n-1,k-1) / k;
}

It is pretty easy to see that both implementations run in $O(k)$ time and avoids overflow errors of fixed precision numbers much better then calculating n!/(n-k)!.

Bodhi
  • 65
  • 1
    I think it would make more sense for your imperative language to not use recursion. Imperative languages typically can't handle deep recursion. Plus, the function is not tail-recursive so even in a functional language there's a memory overhead associated with the recursive calls. – Dici Dec 01 '18 at 18:32
  • 1
    There is a tail-recursive implementation on StackOverflow at https://stackoverflow.com/a/65362753/29771. – Glenn Dec 21 '20 at 16:48
17

Maybe use $$\binom{n}{k}=\frac{n}{k}\binom{n-1}{k-1}.$$

André Nicolas
  • 507,029
  • 2
    You're going to have to dumb it down a little bit. I don't know how to read it and translate it into code. – Manuel Sep 26 '12 at 03:20
  • The symbol $\binom{n}{k}$ is the binomial coefficient. This gives a recurrence. You can either write the code more or less directly as is, or work upwards. I am assuming here that you are using real arithmetic rather than integer arithmetic. For integer arithmetic, one can use $\binom{n}{k}=\binom{n-1}{k}+\binom{n-1}{k-1}$, but it is a lot slower. – André Nicolas Sep 26 '12 at 04:32
  • 5
    @AndréNicolas You can still multiply by $n/k$ in integer arithmetic: just multiply by $n$ first. If the number is large enough to overflow, one can reduce $n/k$ to lowest terms and then do the division first. – Erick Wong Sep 26 '12 at 17:13
9

I presume you are looking for a floating point answer here.

$\prod_{i=0}^{k-1} \frac{n-i}{k-i} $. Compute each term $\frac{n-i}{k-i}$ and multiply.

copper.hat
  • 172,524
  • Why the downvote? Something change in the last 9 years??? – copper.hat Mar 12 '22 at 07:26
  • This is the same as the accepted answer, why pick me? – copper.hat Mar 12 '22 at 07:27
  • (I didn't downvote.) Floating-point multiplication is not associative. Would it be better to multiply the larger numbers first, or the smaller numbers first? – mr_e_man Mar 22 '22 at 00:08
  • @mr_e_man I don't know. I don't think it makes a huge difference as it is all multiplication & division. The simple floating point model suggests that the relative error is of the order $(1+\epsilon)^k$. – copper.hat Mar 22 '22 at 00:42
5

Note that $$ \frac{n!}{k!(n-k)!} = \frac{n(n-1)(n-2)\cdots (n-k+1)}{k!} $$ When doing this by hand, there is much cancellation that can be done. In fact, since $\binom{n}{k}$ is an integer, we are guaranteed to be able to cancel all of $k!$.

By they way, since $\binom{n}{k} = \binom{n}{n-k}$, we can always set up the fraction so that there are no more than $n/2$ factors on top and bottom.

Hope this helps!

Shaun Ault
  • 8,871
5

You can use Stirling's approximation to calculate the logarithm. $\ln n!\approx n \ln n - n + \frac 12 \ln (2 \pi n)$. It is quite accurate as $n$ gets large (even $10$)

Ross Millikan
  • 374,822
2

From programmatic aspect, use log transformed version can be very efficient. As I often need p-value derived from cumulative hypergeometric or binomial distribution for quantitatively characterizing the problem at hand, negative 10-based log transformation is required for reserving very very small p-value (e.g., $10^{-20}$) in practice. 10-based log transformed binomial coefficients $C$ can be calculated as: $$log_{10}C=\sum_{i=1}^n log_{10}(i) -\sum_{i=1}^klog_{10}(i)-\sum_{i=1}^{n-k}log_{10}(i)$$ So in programming, simply generate an array S with elements ranging from $1$ to $n$, and $log_{10}$ transform each element. The $log_{10}$ transformed coefficient can be calculated simply by summing the whole array, then subtracting the sum of arrays ranging from $1$ to $k$ and $1$ to $n-k$ from it, i.e., in MATLAB, logc=sum(S)-sum(S(1:k))-sum(S(1:n-k)), and in Python, logc=sum(S)-sum(S[:k])-sum(S[:n-k]). Only one array is needed.

For $k=0$ or $k=n$, set $log_{10}C=0$.

Elkan
  • 129
  • Super neat trick - love it! Exactly what I was looking for – Oded R. Jan 09 '21 at 15:58
  • Also when I think about it you can even improve on this, as sum(S)-sum(S[:n-k]) == sum(S[n-k:]), so your last line could just be logc = sum(S[n-k:]) - sum(S[:k]) – Oded R. Jan 09 '21 at 16:04
1

I have written a class to handle common functions for working with the binomial coefficient. It performs the following tasks:

  1. Outputs all the K-indexes in a nice format for any N choose K to a file. The K-indexes can be substituted with more descriptive strings or letters. This method makes solving this type of problem quite trivial.

  2. Converts the K-indexes to the proper index of an entry in the sorted binomial coefficient table. This technique is much faster than older published techniques that rely on iteration. It does this by using a mathematical property inherent in Pascal's Triangle. My paper talks about this. I believe I am the first to discover and publish this technique, but I could be wrong.

  3. Converts the index in a sorted binomial coefficient table to the corresponding K-indexes. I believe it might be faster than the link you have found.

  4. Uses Lilavati method to calculate the binomial coefficient, which is much less likely to overflow and works with larger numbers.

  5. The class is written in .NET C# and provides a way to manage the objects related to the problem (if any) by using a generic list. The constructor of this class takes a bool value called InitTable that when true will create a generic list to hold the objects to be managed. If this value is false, then it will not create the table. The table does not need to be created in order to perform the 4 above methods. Accessor methods are provided to access the table.

  6. There is an associated test class which shows how to use the class and its methods. It has been extensively tested with 2 cases and there are no known bugs.

To read about this class and download the code, see Tablizing The Binomial Coeffieicent.

MJD
  • 65,394
  • 39
  • 298
  • 580
Bob Bryan
  • 121
1

The following code shows how to obtain all the binomial coefficients for a given size 'n'. You could easily modify it to stop at a given k in order to determine nCk. It is computationally very efficient, it's simple to code, and works for very large n and k.

binomial_coefficient = 1
output(binomial_coefficient)
col = 0
n = 5

do while col < n
    binomial_coefficient = binomial_coefficient * (n + 1 - (col + 1)) / (col + 1)
    output(binomial_coefficient)
    col = col + 1
loop

The output of binomial coefficients is therefore:

1
1 *  (5 + 1 - (0 + 1)) / (0 + 1) = 5 
5 *  (5 + 1 - (1 + 1)) / (1 + 1) = 15
15 * (5 + 1 - (2 + 1)) / (2 + 1) = 15 
15 * (5 + 1 - (3 + 1)) / (3 + 1) = 5 
5 *  (5 + 1 - (4 + 1)) / (4 + 1) = 1

I had found the formula once upon a time on Wikipedia but for some reason it's no longer there :(