2

Let $G=(V, \Sigma, R, S)$ be a (non regular) probabilistic context-free grammar, and $u_1, \ldots, u_n$ a set of $n$ strings generated by $G$.

For finite $n$, it is always possible to find a regular grammar $\hat G=(\hat V, \Sigma, \hat R, S)$ which generates the strings $u_1, \ldots, u_n$.

Intuitively, as $n$ goes to infinity, we expect $\hat G$ to get larger: my guess is that the cardinal of $\hat R$ (and maybe also the cardinal of $\hat V$?) would need to go to infinity.

Are there results which formalize this, e.g. by giving a lower bound on these cardinals as a function of $n$?

Robin Ryder
  • 121
  • 4
  • Are you referring to strictly regular grammars or are extended ones also allowed? – dkaeae Dec 20 '18 at 13:08
  • @dkaeae I am mostly interested in strictly regular grammars, but would also welcome answers related to extended ones. – Robin Ryder Dec 20 '18 at 13:38
  • 1
    If we replace probabilistic context-free grammar by (usual) context-free grammar in the question, does it make any difference? If it does, please clarify. – John L. Dec 21 '18 at 08:01
  • @Apass.Jack I would be equally happy with an answer about usual CFGs. Using PCFGs means that it might be possible to get a probabilistic statement, e.g. about the expected value of the cardinal of $\hat R$. – Robin Ryder Dec 21 '18 at 09:24

1 Answers1

1

The question, if understood in the simplest naive way, might be uninteresting.

Here is a simple example. Let $G$ be the regular language $\{a^n\mid n\ge0\}$ over the alphabet $\{a\}$.

Consider the strings $\epsilon, a, a^2, \cdots, a^n$ in $G$. What are the regular languages that contains those strings?

  • The minimal such language, i.e., which contains no other strings, will need $n+1$ generation rules $S\to a^i$ for $0\le i\le n$.
  • The language with the least generation rules is $G$ itself, which has two generations rules $S\to \epsilon$ and $S\to aS$.

However, once we start to twiddle with the ways how to approximate context-free grammar by regular grammar, there are tons of research.

Here is a related question that links to many related stuff, Is there a known method for constructing a grammar given a finite set of finite strings?.

This paper considers approximating CFG from above by a regular grammar.

You can browse the google search result for approximate context-free grammar by regular grammar or the google scholar.

John L.
  • 38,985
  • 4
  • 33
  • 90