I wish to find the CFG for a language on two symbols (say a and b) whose words begin and terminate with the same symbol, and have equal quantities of a's and b's. What is the thought process I should use for finding such a grammar? What is the most natural or simplest grammar for this language? I hope you'll explain your answer. Hopefully this will suggest some patterns I should look for when trying to synthesise a grammar for a specified language.
Here's the best solution I could come up with on my own:
$ S \to aTbbTa \mid bTaaTb$
$ T \to abT \mid baT \mid aTb \mid bTa \mid \epsilon$
I think this grammar is correct. Informal argument: I can see that if the word is $awa$ ($w$ being a substring), then there are at least two $b$'s in $w$ that are adjacent. This suggests the form $aTbbTa$ in the first production rule. (The argument holds if the roles of the two symbols are reversed). The second set of production rules is meant to generate every possible word of the language while keeping the number of $a'$s and $b'$s equal. Symmetry suggests that the rule $abT$ should be accompanied by the rule $baT$, and $aTb$ by $bTa$. Initially was wondering if any of the rules in the second set was redundant, but I don't think so - I can think of words that couldn't be formed if any of the second set of production rules was missing. Rather I need to be sure there aren't any words from the language that my grammar can't generate.
[I guess I would need induction to prove my grammar generates every possible word in the language. But right now I'm more interested in the thought process behind coming up with a grammar, and as far as I know, induction (in general) doesn't help much in synthesising a solution/rule/formula/etc.; it principally serves to verify a purported solution.]