2

The Catalan number $C_n$ counts the number of ways $n$ pairs of parentheses can be written without having a close-parenthesis too early, and it also counts the number of ways $n+1$ factors can be grouped along with an non-associative binary operator.

For example, $C_3 = 5$ and the permissible binary strings are (copied from Wikipedia):

((())) ()(()) ()()() (())() (()())

The ways to group four factors $a, b, c, d$ are:

((a*b)*c)*d (a*(b*c))*d (a*b)*(c*d) a*((b*c)*d) a*(b*(c*d))

What is the bijection between these two sets? I know that both of them can be shown directly to be counted by the Catalan numbers, but I want to know what the correct map is from one set to the other.

NoName
  • 2,975
  • Note that the accepted answer to the question that I closed this as a duplicate of makes essentially the same points as I did in my answer: the recurrence is the natural point of departure; it has a natural interpretation for parenthesized factors; and a rather less natural interpretation that involves an asymmetric choice for parenthesis strings. – joriki Mar 08 '20 at 19:15

1 Answers1

2

I’m not sure there is such a thing as “the correct map”, but a condition that one might want to impose on any reasonable map is that it be compatible with the recurrence relation for the Catalan numbers,

$$ C_{n+1}=\sum_{k=0}^nC_kC_{n-k}\;. $$

For $n+1$ non-associative operations (with $n+2$ factors), the derivation of this recurrence is straightforward: There is exactly one outermost operation, and its two operands must contain a total of $n$ operations for the expression as a whole to contain $n+1$ operations.

For parenthesized expressions, the derivation of the recurrence is a bit more involved (which already shows you that the bijection isn’t trivial). It’s given at Showing Directly that Dyck Paths Satisfy the Catalan Recurrence. The factor $C_k$ represents the shortest fully balanced initial substring, minus its enclosing parentheses, and the factor $C_{n-k}$ represents the rest of the string. Note that there’s an arbitrary choice here whether to use the shortest fully balanced initial or final substring – there was no such arbitrary choice in the case of the operations (which again shows you that the bijection isn’t trivial).

So a bijection that respects the recurrence would consist in putting the outermost operation between the shortest fully balanced initial substring and the rest, and applying this recursively to the two substrings (with empty strings corresponding to the actual factors). For $n=3$, this yields:

((()))  <-->  ((a*b)*c)*d
()(())  <-->  a*((b*c)*d)
()()()  <-->  a*(b*(c*d))
(())()  <-->  (a*b)*(c*d)
(()())  <-->  (a*(b*c))*d

If we use the shortest fully balanced final substring instead, the correspondence would be:

((()))  <-->  a*(b*(c*d))
()(())  <-->  (a*b)*(c*d)
()()()  <-->  ((a*b)*c)*d
(())()  <-->  (a*(b*c))*d
(()())  <-->  a*((b*c)*d)
joriki
  • 238,052
  • The latter corresponds to the map I thought of (with the parenthesized factors, read * as open parentheses and ) as close parentheses, so (a*(b*(c*d))) gives us ((())) ) but your explanation of tying to the recurrence is more satisfying. – NoName Mar 08 '20 at 02:27