3

How many ways can $k$ numbers be chosen from the first $n$ natural numbers so that the longest string of consecutive numbers is exactly $m$ numbers long

For example, if choosing $k = 7$ distinct numbers from the first $n = 14$ natural numbers ($1-14$), how many combinations of numbers are there that have exactly $m = 3$ consecutive numbers?

Some sets that satisfy this would be $[1,2,3,5,7,10,13]$, $[1,2,3,7,8,10,11]$, or $[1,2,3,6,9,10,11]$.

Obviously $m \le k \le n$, and the order of the numbers picked does not matter, but there is no repetition, so $[1,2,3]$ is the same as $[2,1,3]$, but $[1,1,3]$ is not allowed.

For cases when $m \gt \frac k2$ I believe the equation $$2* \binom {n-m-1}{k-m} + (n-m-1)*\binom{n-m-2}{k-m}$$

will produce the correct answer, but for values of $m \le \frac k2$ I do not know how to account for multiple strings of consecutive digits in the same set (eg. $[\textbf{1,2,3},6,\textbf{8,9,10}]$) not being counted twice.


If anyone could give me help with this example that I could extrapolate from or (ideally) a formula or any references to solve for a general case it would be much appreciated.

corndog
  • 33
  • I posted an answer, but then looked back at your question and realized that I didn't understand it completely. When you say the selection has exactly $m$ consecutive numbers, do you mean there is exactly one string of $m$ consecutive numbers but that there may also be other consecutive strings that are shorter? – user84413 Jul 23 '17 at 20:14
  • Yes. The way I meant it was that there may be other strings that are shorter, so for example if $n=10$, $k=7$, and $m=3$ you could have $[1,2,3, 5,6,7,9]$ which would have multiple strings of length $m$ or $[1,3,4,5,7,8,10]$ which would have one string of length $m$ but also other shorter strings of consecutive numbers. However, you could not have $[1,2,4,5,6,7,9]$ because that has a string of length greater than $m$. – corndog Jul 24 '17 at 00:23
  • Sorry, I realize that what I wrote could have been interpreted differently than what I meant. Instead of exactly one string of length $m$, I meant that there could be multiple strings, but none longer than length $m$. – corndog Jul 24 '17 at 00:26
  • 1
    Thanks for explaining this; I think we misinterpreted the question because of the way the title is worded. – user84413 Jul 24 '17 at 17:48
  • ok, but then amend your post, not to compel to go through the comments ! – G Cab Jul 27 '17 at 00:27
  • @GCab I deleted my original post when I realized this, and posted a different answer instead. – user84413 Jul 27 '17 at 21:14
  • 1
    Good, that sounds better. With standard terminology you are speaking of k-subsets from set ${1,2,\cdots , n}$ which contains .... It's a very interesting problem I am trying to work out. – G Cab Jul 29 '17 at 00:42
  • and I could work it out: hope my answer is satisfactory to you – G Cab Aug 01 '17 at 22:02

3 Answers3

3

Let $l=n-k$, and line up $l$ sticks (representing the numbers not chosen).

If we let $x_i$ represent the number of integers in gap $i$, the number of choices with no string longer than $m$ is given by the number of solutions of $x_1+\cdots+x_{l+1}=k\;$ with $0\le x_i\le m$ for each $i$,

so there are $\displaystyle s_1=\binom{n}{l}-\binom{l+1}{1}\binom{n-(m+1)}{l}+\binom{l+1}{2}\binom{n-2(m+1)}{l}-\cdots$ such choices.

To count the number of choices with at least one string of length $m$, we can subtract the number of choices with no string longer than $m-1$, which is given by the number of solutions of

$\hspace{.2 in}x_1+\cdots+x_{l+1}=k\;$ with $0\le x_i\le m-1$ for each $i$,

so there are $\displaystyle s_2=\binom{n}{l}-\binom{l+1}{1}\binom{n-m}{l}+\binom{l+1}{2}\binom{n-2m}{l}-\cdots$ such choices.

Then there are $\displaystyle s_1-s_2=\sum_{j=1}^{\lfloor\frac{k}{m}\rfloor}(-1)^{j+1}\binom{l+1}{j}\left[\binom{n-jm}{l}-\binom{n-j(m+1)}{l}\right]$ possibilities.


(I am using the nonstandard convention that $\dbinom{r}{l}=0$ if $r<l$.)

user84413
  • 27,211
  • This still isn't working for me. I'll be honest, I don't quite understand how you got the equations for $s_1$ and $s_2$. But also, the stopping point for the summation ($\lceil\frac{2k-n}{m}\rceil$) doesn't really seem to work because if $k \lt \frac m2$ it goes to zero and the summation returns 0 een if we know this is wrong. – corndog Jul 26 '17 at 23:04
  • 1
    @corndog I think these answers might help: https://math.stackexchange.com/questions/1429561 https://math.stackexchange.com/questions/904734 (I may not have the stopping point for the summation right, but I think it's correct.) – user84413 Jul 26 '17 at 23:23
  • Thanks so much for those links, I admittedly know very little about combinatorics, so just reading all of this is slowly starting to make it all come together for me – corndog Jul 27 '17 at 02:39
  • This is wrong in a few ways. Firstly in the $s_1$ and $s_2$ expressions I see you're using inclusion exclusion to count the number but $\binom{k-(m+1)}{l}$ is not how many ways you have to distribute the remaining $k-(m+1)$ integers in the gaps and even if it was it counts some possibilities twice (For example if k=6 and m=2, you could choose your gap of 3 to be in gap 1 and then distribute the remaining 3 all in gap 2 or you could choose your gap of 3 to be in gap 2 and then distribute the remaining 3 all in gap 1 and these are counted differently). – PJF49 Jul 27 '17 at 09:23
  • Then when you come to work out an expression for $s_1 - s_2$ the limit of your summation is wrong due to the fact $\binom{k-jm}{l}$ is wrong (if $2k\le n$ you have to sum between 1 and 0). Also it is possible for $s_1$ and $s_2$ to have a different number of terms and so putting them in the same summation is impossible. – PJF49 Jul 27 '17 at 10:25
  • @PJF49 I don't understand your comments, but if you give a specific example where this gives the wrong answer, that would be helpful. – user84413 Jul 27 '17 at 18:33
  • Okay examples, if n=2, k=1 and m=1 the possible s – PJF49 Jul 27 '17 at 19:45
  • If $n=2, k=1, m=1$ the possible subsets are $[1]$ and $[2]$ whereas your formula gives $\sum_{j=1}^{0}(-1)^{j+1}\binom{l+1}{j}\left[\binom{k-jm}{l}-\binom{k-j(m+1)}{l}\right]$ which isn't defined as the upper limit is less than the lower limit. Even worse if $n=3, k=1, m=1$ there are three possible subsets but in your formula the upper limit is $-1$. Even if the upper limit is positive it still doesn't work. For example if $n=3, k=2, m=2$ the possible subsets are $[1,2]$ and $[2,3]$ whereas your formula gives $\binom{2}{1}\left[\binom{0}{1}-\binom{-1}{1}\right]$ which is obviously wrong. – PJF49 Jul 27 '17 at 19:59
  • 1
    If $n=5,k=3,m=2$ the possible subsets are $[1,2,4], [1,2,5], [2,3,5], [1,3,4], [1,3,5], [2,3,5]$. However your formula gives $\sum_{j=1}^{\lceil\frac{6-5}{2}\rceil}(-1)^{j+1}\binom{2+1}{j}\left[\binom{3-2j}{2}-\binom{3-3j}{2}\right] = \binom{3}{1}\left[\binom{1}{2}-\binom{0}{2}\right]$ which is also wrong. – PJF49 Jul 27 '17 at 20:14
  • @PJF49 Thank you for these examples -- this helped me find a typo in my formula, and a mistake in the upper limit. – user84413 Jul 27 '17 at 21:10
  • 1
    Okay it seems your solution now works and some of bits I said were wrong were actually right, it took me a while to work out what some of the terms were doing (sorry about that). Your solution is now very nice, I wish I'd thought of some of it. – PJF49 Jul 27 '17 at 21:45
  • 1
    I saw your amended formula after I posted mine. The results coincides (apart that your formula gives 0 for the case n=k) (+1) – G Cab Aug 01 '17 at 22:34
  • @user84413 in the final function, should the two expressions in the brackets $\binom{n-jm}{l}$ and $\binom{n-j(m+1)}{l}$ be switched? What you have appears to be $s_2-s_1$ – corndog Aug 11 '17 at 21:36
  • also could you explain how you got the stopping point of $\lfloor{\frac km}\rfloor$ – corndog Aug 12 '17 at 15:16
  • @corndog I believe the expression given equals $s_1-s_2$, because of the alternating signs in both sums. I found the upper limit for the sum by solving $n-jm<l$, where $l=n-k$. – user84413 Aug 12 '17 at 20:19
  • Yeah you're right, I had just flipped the powers for $(-1)$, (I had $(-1)^j$ not $(-1)^{j+1}$ so everything was coming out negative. – corndog Aug 16 '17 at 03:39
1

Another more practical solution uses a recurrence relation. Let the size of the gaps between the $n-k$ numbers not chosen be $x_1,x_2,...,x_{n-k+1}$. Then the problem reduces to how many ways are there such that $\forall i$, $x_i\in [0,1...m]$, $\sum_{i=1}^{n-k+1}x_i = k$ and at least one of $x_i = m$.

To solve this first consider the solutions of another similar problem, $\forall i$, $x_i\in [0,1...m]$, $\sum_{i=1}^{y}x_i = k$ where the constraint at least one of $x_i = m$ doesn't exist and $n-k+1$ has been replaced by $y$. Let the number of solutions of this problem be the function $S(y,k,m)$. Next consider removing the last $x_i$ in a solution of $y,k,m$. Then the remaining $x_i$ will give us a solution of $y-1,k-a,m$ for some $a\in [0,1...m]$. Therefore $S(y,k,m) = \sum_{a=0}^{m}S(y-1,k-a,m)$. $S(1,k,m) = 1$ iff $k\in [0,1...m]$ which allows us to tabulate all other values.

For example in the $m=2$ case:

$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|} \hline \text{y\k} & \text{0} & \text{1} & \text{2} & \text{3} & \text{4} & \text{5} & \text{6} & \text{7} & \text{8} & \text{9} & \text{10} \\ \hline \text{1} & \text{1} & \text{1} & \text{1} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} \\ \hline \text{2} & \text{1} & \text{2} & \text{3} & \text{2} & \text{1} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} & \text{0} \\ \hline \text{3} & \text{1} & \text{3} & \text{6} & \text{7} & \text{6} & \text{3} & \text{1} & \text{0} & \text{0} & \text{0} & \text{0} \\ \hline \text{4} & \text{1} & \text{4} & \text{10} & \text{16} & \text{19} & \text{16} & \text{10} & \text{4} & \text{1} & \text{0} & \text{0} \\ \hline \text{5} & \text{1} & \text{5} & \text{15} & \text{30} & \text{45} & \text{51} & \text{45} & \text{30} & \text{15} & \text{5} & \text{1} \\ \hline \end{array}$$

Then let $P(y,k,m)$ be the number of solutions of the problem with the constraint at least one of $x_i = m$ included. $P(y,k,m) = S(y,k,m) - S(y,k,m-1)$.

For example $P(3,3,2) = S(3,3,2) - S(3,3,1) = 7-1 = 6$ (with $S(3,3,1)$ quickly worked out by hand).

Therefore the number of solutions of the original problem with $n=5, k=3, m=2$ is $P(5-3+1,3,2) = P(3,3,1) = 6$ and as the possible solutions of this problem are $[1,2,4], [1,2,5], [2,3,5], [1,3,4], [1,3,5], [2,3,5]$ this is indeed correct.

Although this is a bit time consuming by hand it should be possible to get a computer to use the recurrence relation to generate the solutions.

user84413
  • 27,211
PJF49
  • 506
  • I quite like this idea, as it is a lot more practical and straightforward. It's a shame though that it involves so much backtracking to produce a lot of solutions. – corndog Jul 29 '17 at 20:40
1

So we can arrange the $k$ choosen numbers in order, and therefore we are speaking of k-subsets from the set $\{1,\cdots , n\}$.

Let's
- put a $0$ in front of the subset (sequence);
- take the backward differences;
- neglecting the first ($=a_1$), consider the differences from the 2nd to $k$-term.
$$ \bbox[lightyellow] { \eqalign{ & 0,a_{\,1} ,a_{\,2} , \cdots ,a_{\,k} \quad \left| {\;1 \le j \le a_{\,j} \le n} \right.\quad \Rightarrow \cr & \;a_{\,1} ,a_{\,2} - a_{\,1} , \cdots ,a_{\,k} - a_{\,k - 1} = a_{\,1} ,d_{\,2} , \cdots ,d_{\,k} \quad \Rightarrow \cr & \Rightarrow \quad d_{\,2} ,d_{\,3} , \cdots ,d_{\,k} \quad \left| \matrix{ \;2 \le j \hfill \cr \;1 \le d_{\,j} = a_{\,j} - a_{\,j - 1} \le n - 1 \hfill \cr \;1 \le \sum\limits_{2\, \le \,j\, \le \,k} {d_{\,j} } = q = a_{\,k} - a_{\,1} \le n - 1 \hfill \cr} \right. \cr} } \tag{1}$$

Then we are looking for the
number of (standard) compositions of $0 \le q \le n-1$ into $1 \le k-1 \le n-1$ parts , each positive (and no greater than $n-1$) , that contain runs of contiguous ones no longer than $0 \le m-1$.
We will take for the moment the cumulative version "runs no longer than", and will later manage to get the version "runs with max length equal to".

For $k=1$, the above scheme does not apply, but for subsets with one element ,clearly, the number of contiguous elements is $1$, and there are $n$ possible subsets.
For $2 \le k$, the same value of $q=a_{k}-a_{1}$ can be attained in $n-q$ ways, and the result shall be summed over $1 \le q \le n-1$.

It is known that the number of compositions of $q$ into $k$ parts is $$ \bbox[lightyellow] { N_{\,s\,c} (q,k) = \left[ {1 \le q} \right]\left( \matrix{ q - 1 \cr k - 1 \cr} \right) } \tag{2.a}$$ where $[P]$ denotes the Iverson bracket.

We can part this quantity according to the number of ones that it contains, denoted by $0\le s \le k$, as $$ \bbox[lightyellow] { \eqalign{ & N_{\,s\,cs} (q,k,s) = \left[ {k = q} \right]\left[ {k = s} \right] + \left[ {k + 1 \le q} \right]\left( \matrix{ q - k - 1 \cr k - s - 1 \cr} \right)\left( \matrix{ k \cr s \cr} \right) = \cr & = \left( {\left[ {k = q} \right]\left[ {k = s} \right] + \left[ {k + 1 \le q} \right]\left( \matrix{ q - k - 1 \cr k - s - 1 \cr} \right)} \right)\left( \matrix{ k \cr s \cr} \right) \cr} } \tag{2.b}$$ where the second term (${{k} \choose {s}}$) corresponds to the number of binary string $(1,\cdots,X,\cdots)$ obtained by replacing with $X$ the parts greater than one, and the first factor to the number of ways to compose the $X$s.

Coming to the binary string, clearly we can change the $X$ with a $0$, then
the number of binary strings of a given length , number of ones, and max length of runs of ones
is extensively treated in this other post.

From there we have that the
Number of binary strings, with $s$ ones , $m$ zeros and runs of ones which are not longer than $r$
is given by $$ \bbox[lightyellow] { N_b (s,r,m + 1)\quad \left| {\;0 \le {\rm integers }\,s,m,r} \right.\quad = \sum\limits_{\left( {0\, \le } \right)\,\,j\,\,\left( { \le \,{s \over r}\, \le \,m + 1} \right)} {\left( { - 1} \right)^j \left( \matrix{ m + 1 \cr j \cr} \right)\left( \matrix{ s + m - j\left( {r + 1} \right) \cr s - j\left( {r + 1} \right) \cr} \right)} }\tag {3.a}$$

Translating that into into our case, the number of
binary strings with length $k$ and $s$ ones and runs of ones no longer than $m$ is $$ \bbox[lightyellow] { \eqalign{ & N_b (s,m,k + 1 - s)\quad \left| {\;0 \le {\rm integers }\, s,m,k - s} \right.\quad = \cr & = \left[ {\;0 \le s} \right]\left[ {\;0 \le m} \right]\left[ {\;s \le k} \right]\sum\limits_{\left( {0\, \le } \right)\,\,j\,\,\left( { \le \,\,k - s + 1} \right)} {\left( { - 1} \right)^j \left( \matrix{ k + 1 - s \cr j \cr} \right)\left( \matrix{ k - j\left( {m + 1} \right) \cr s - j\left( {m + 1} \right) \cr} \right)} \cr} } \tag{3.b}$$

Going back through the steps above, the Number of composition of $q$, with length $k$ , number of ones $s$ and runs of ones no longer than $m$ will be the above multiplied by the first factor in (2.b) $$ \bbox[lightyellow] { \eqalign{ & N_{\,c\,q\,s} = (q,k,s,m)\quad \left| {\;{\rm integers }q,s,m,k} \right.\quad = \cr & = \left( {\left[ {k = q} \right]\left[ {k = s} \right] + \left[ {k + 1 \le q} \right]\left( \matrix{ q - k - 1 \cr k - s - 1 \cr} \right)} \right)N_b (s,m,k - s + 1) = \cr & = \left( {\left[ {k = q} \right]\left[ {k = s} \right] + \left[ {k + 1 \le q} \right]\left( \matrix{ q - k - 1 \cr k - s - 1 \cr} \right)} \right)\; \cdot \cr & \cdot \sum\limits_{\left( {0\, \le } \right)\,\,j\,\,\left( { \le \,\,k - s + 1} \right)} {\left( { - 1} \right)^j \left( \matrix{ k + 1 - s \cr j \cr} \right)\left( \matrix{ k - j\left( {m + 1} \right) \cr s - j\left( {m + 1} \right) \cr} \right)} \cr} } \tag{4}$$

Therefore, the
number of k-subsets from the set $\{1,\cdots , n\}$ having no more than $m$ contiguous characters
is $$ \bbox[lightyellow] { \eqalign{ & N_{c\,u\,m} (n,k,m) = \left[ {1 = k} \right]\left[ {1 \le m} \right]n + \cr & + \left[ {2 \le k} \right]\sum\limits_{0\, \le \,q\, \le \,n - 1} {\sum\limits_{0\, \le \,s\, \le \,k} {\left( {n - q} \right)N_{\,c\,q\,s} (q,k - 1,s,m - 1)} } \cr} } \tag{5.a}$$

Finally, the number of k-subsets with one or more run of contiguous elements of length $m$, and none longer will clearly be $$ \bbox[lightyellow] { N_{c\,u\,m} (n,k,m)-N_{c\,u\,m} (n,k,m-1) } \tag{5.b}$$

G Cab
  • 35,272