Formal grammar with constraints on the number of each symbol

Question

I have a language where each type of symbol is only allowed a particular number of times, but the order isn't important. For example, lets say there are three symbols $a, b, c$, and a valid string in the language consists of at most 5 $a$, 3 $b$ and 2 $c$ characters. So $ababbc$ is valid, $ccbbbaaaaa$ is valid, but $abcabcabc$ is not. I'm struggling to come up with a grammar that satisfies such constraints without resorting to enumerating each possibility. Is it possible to have a concise set of rules that encode these constraints?

What do you need the grammar for? – reinierpost Mar 09 '20 at 09:39 — reinierpost, Mar 09 '20 at 09:39

rici · Accepted Answer · 2020-03-09T15:41:49.870

No.

Since the permutation language is finite, you can produce a grammar by enumerating the individual sentences which comprise the language. Some simplifications are possible, for example by finding common prefixes and/or suffixes. But there is no grammar which is significantly smaller than the language itself (for some definition of "significantly").

This is not really as dramatic a conclusion as it sounds. Most finite languages lack compact grammars; grammars achieve concision when they can use recursive constructs which represent infinite languages.

In fact, in an interesting paper cited in the answer to this question on https://cstheory.stackexchange.com/, there's a proof that the no regular expression for the language $P_n$ consisting of the permutations of $\{1,2,\dots,n\}$ can be shorter than $2^n$. However, the (infinite) inverse $\overline{P_n}$ can be described by a regular expression of size $O(n^2)$. (There are lots of other interesting results in that paper.) @yuval-filmus extended that result in Lower Bounds for Context-Free Grammars to a larger set of languages, including the language of permutations of the multiset of $n$ different elements in which each element appears $k$ times.

In the halcyon era of SGML, whose syntax description language included a permutation operator, there was some investigation into how to optimise state machines to recognise such languages. IIRC, the final conclusion was that it can't be done and therefore the permutation operator is not a good idea. In practice, such languages are generally recognised in two phases: first, the input is matched against the infinite language of unrestricted repetitions, and then a second scan (not implemented with a CFG) is done to count repetitions of each element. But that algorithm is not completely sound. First, it assumes that there is a unique decomposition of a string into elements, which is not always the case, although it certainly was the case for SGML. Second, the algorithm greedily processes all successive elements. If the intention of the grammar were to recognise the longest legal permutation, leaving open the possibility that (over-)repeated elements be part of a different syntactic construct, then legal inputs will fail because of the greediness.

score 1 · Answer 2 · answered Mar 09 '20 at 04:40

1

This language is regular, so you can write a NFA or a regular grammar that accepts this language.

See How to prove a language is regular? for our reference material on this topic.

answered Mar 09 '20 at 04:40

D.W.

159,275
20
227
470

Formal grammar with constraints on the number of each symbol

2 Answers2