2

I was following a report on the recovery of the Maya script. One step in this recovery process was the conclusion, that the writing with up to 500 glyphs had to be a syllabary. That got me thinking about how many phonetically different syllables the German language has. How many characters would you need for a German syllabary?

We have simple syllables like “o” (1 vowel) and rather complicated syllables like “streichst” (3 consonants (fricative—plosive—fricative) + 2 vowels + another 3 consonants (fricative—fricative—plosive), where in the coda, fricative—plosive—fricative does also occur). We have typical syllables like “ma” and more unusual ones like “mungs” (as in Strömungsmechanik) or even “chur” (as in Fuchur, dragon character from Michael Ende’s “Die Unendliche Geschichte”). I've tried to set up and quantify rough rules about how syllables can be formed, based on the sounds used in German (no English “th”, no French diphtongs etc.) and calculated over 10 million possibilities (many of which shouldn't be syllables currently actually occurring in the German language). However, 500 seems very little to me.

Are there any numbers?


I tried the following: First I transcribed a few sentences from the German language into a kind of loudscript. (I don't know any phonetic script, and I didn't want to reproduce all the acoustic details, but rather wanted to proceed in the same way as a speaker of the language who thought up a script might have done it in his mind many centuries ago.) Then I used a small program to determine how many syllables per 20 syllables were new, i.e. had not yet appeared in the previous text.

static String text = "däa be.grif ho.lo.do.moa u.kra.ii.ni§ töö.tuñ dur¤ hu.ña §teet füa deen tail däa hu.ñaß.noot in däa sø.wjäät.uun.joon in deen 9.10.100.3.ßig.äa jaa.rän in där u.kra.ii.ni.§ön so.tßja.lis.ti.§ön sø.wjäät.ree.puu.blik* in dii.säa uun.joonß.ree.puu.blik fii.län deem hu.ña §ää.tßuñß.wai.sö 3 biß 7.1000000 män.§ön tsum ø.pfa* dii u.kra.ii.nö bee.müüt si¤ sait däa un.ab.häñ.i¤.kait 19.100.1.unt.90 um ai.nö in.täa.na.tsjoo.naa.lö an.äa.kän.nuñ däß ho.lo.do.moaß alß föl.ka.moat* dii.sö bee.wea.tuñ ißt um.§tri.tön unt wiat isß.bee.son.dä.rø fon dea ree.gii.ruñ ruß.lndß kri.ti.siat* in.tea.na.tjo.naal fin.dät sii a.ba im.ma mea tsu.§tim.muñ*\n"
        + "gil.ti plä.¥a døit§ ät.wa sün.dii.gäß oo.da §ult.bee.wuß.täß fäa.gnüü.gän ißt ain äñ.li§.§praa.xii.gäa pøp.ßoñ* mit deem tii.töl fea.trat kro.aa.tßi.en baim øi.ro.wi.¥ön ßoñ.kon.täst 20.2.unt.20 in tuu.riin*\n"
        + "ain §iam.maa.xa in äl.tä.rän bee.tsaich.nu.ñön aux um.bäl.laa.rii.uß paa.raa.plui.maa.xa oo.da paa.raa.ßøl.maa.xa änt.wirft unt fäa.tikt §ia.mö* dea an.äa.kan.tö auß.bil.duñß.be.ruuf wirt in dea gru.pä dea hølts.hant.wäa.ka gee.füat op.wool in ree.gen o.da so.nøn.§ia.mön läñßt fii.lö mee.tal unt kunßt.§tøf.tai.lö fer.wän.döt wea.dön* dii §iam.maa.xa gee.höö.rön ai.nöm mit.la.wai.lö säl.tö.nön bee.rufs.tswaig an*\n"
        + "daß sü.ri§ oa.too.do.kße kloo.ßta moa ja.kop tsa.lee ißt ai.nös dea el.täß.tön krißt.li.chön klöö.ßta dea wält* eß ilkt im tuur ab.diin in dea süüd.øßt.tüa.kai im døaf saa.lee unt ißt ai.nöß dea klöö.ßta dea süü.ri§.øa.too.dø.ksön kia.¤ö fon an.ti.ø.xi.ön* auf.grunt sai.na ään.li¤.kait mit deem kloo.ßta moa gaa.brii.äl wirt dii glai.xö ent.§tee.uñkß.tßait 6.100 naax kriß.tuß fäa.muu.töt* äß wirt tßu.deem fäa.muu.töt daß dii nöat.lix däß kloo.ßtaß gee.lee.gö.nö an.laa.gö auß §tain.kwaa.dan dii räß.tö ai.nöß hait.ni.§ön kult.bauß bee.hea.beakt* 2 böö.gön dii.sa an.laa.gö sint hoi.tö nox äa.ken.baa*\n"
        + "d¥uñ.kö bee.tsai.xnöt ai.nö fiil.tsaal 1 o.da mea.maß.ti.ga see.göl.§if.tüü.pön traa.di.tsjo.näl.la bau.aat in ¤ii.na* dii d¥uñ.kön faa.rön altß han.döls laß.tön o.da fi.§ä.rai.§i.fö auf deen ¤ii.nee.si.§ön flü.ßön deen küß.tön.gee.wä.ßan unt dea hox.see* oft.malß wea.den sie alß hauß.boo.tö gee.nutßt* dii gröö.ßä.rön d¥uñ.kön ha.bön ain fa.ßuñkß.fea.möö.gön fon 4.100 biß 5.100 ree.giß.ta.ton.nön*\n"
        + "dii höö.fa.§pi.tßä ißt ain 2.1000.100.1.unt.3.ßig mee.ta üü.bea deem a.dri.aa.ti.§ön mee.röß.§pii.göl hoo.a bäak in deen al.goi.ja al.pön*\n"
        + "roo.lant buu.ti ai.gänt.lix roo.lant büü.tii.koo.fa ißt ain §wai.tßa §rift.stäl.la unt lee.ra*\n"
        + "sül.fään von waa.lii.si§ fun.daa.mänt ißt ai.nö see.rii.fön.§rift füa mee.rä.rö §rift.süß.tee.mö die 19.100.8.unt.9.tßig fon d¥øn hat.ßön da.böl.juu roß milß unt ¥ee.ral.diin wäjt füa mai.kro.ßoft änt.wi.kält wua.dö* 19.100.7.unt.9.tßig §täl.tö mai.kro.ßoft den auf.traak ai.nö §rift.aat füa mee.rö.rö §rift.süß.tee.mö tßuu änt.wäa.fön* dii fäa.ti.gö §rift.aat ent.hält 3.1000.8.100.2.und.4.tßig §rift.tßai.¤ön unt un.ta.§tütßt dii la.tai.ni.§ö küü.ril.li.§ö grii.¤i.§ö a.mee.ni.§ö gee.øa.gi.§ö unt dii ää.ti.oo.pi.§ö §rift*\n"
        + "nuu.wäl ree.wüü froñ.ßääß ißt ai.nö fran.zöö.si.§ö li.tä.ra.tua.tßait.§rift dii sait deem 1.tön fee.bruu.aa 19.100.9 mit un.ta.brä.¤u.ñön wää.rönt dea bai.dön wält.krii.gö unt nax deem 2.tön wält.kriik biß hoi.tö ea.§aint* sii wirt fom fea.laak ga.lii.maar hea.rauß.gö.gee.bön*";
public static void main(String[] args) {
    List<String> syllables = Arrays.asList(text.split("[ \\.\n*]"));
Set<String> isNew = new HashSet<>();
int countNew = 0;
for (int listIndex = 1; listIndex < syllables.size(); listIndex++) {
    String syllable = syllables.get(listIndex);
    if (isNew.add(syllable)) countNew++;
    if (listIndex % 20 == 0) {
        System.out.println(listIndex + ":" + countNew);
        countNew = 0;
    }
}

}

As expected, the number goes down, and it has to go down to 0 at some point for an infinite amount of meaningful German text.

In the second step, I had the graph calculate a function and let it continue over more text. You can see here (coulums D24:F507):

Calculation

This leads me to roughly 1.500 sylables. Of course this idea could be made more precise with more text, but to be honest I would have expected a lot more syllables!

guidot
  • 28,192
  • 2
  • 35
  • 84
Maron
  • 133
  • 4
  • 1
    Just for comparison, Unicode reserves 11,172 code points for Hangul, the Korean script. (It isn't a true syllabary,, since each character is written as a combination of up to three letters,) This allows for 19 initial consonants, 21 medial vowels, and 28 final consonants. Korean is more sound poor than German, so the number of possible syllables in German is probably much greater. I don't know what the practical upper limit is, but I'm guessing Chinese, with several thousand characters, is close. – RDBury Apr 08 '22 at 16:43
  • "However, 500 seems very little to me." - as a very rough comparison: The "closure" of Mandarin syllables based on the Pinyin syllable table is 880. Many of the cells are blank in there, though that is somewhat compensated for by the "er" contraction described further down in that article. Times 5 for different tones (not all of which are used for syllable) gives you 4400 as an upper bound. And that's apparently a sufficient number of syllables for a vocabulary whose "words" often have just one or two, sometimes three, syllables. – O. R. Mapper Apr 08 '22 at 20:43
  • @RDBury: "I don't know what the practical upper limit is, but I'm guessing Chinese, with several thousand characters, is close." - while Chinese characters more or less represent one syllable each, plenty of spoken syllables can be represented by different characters depending on which meaning is intended, and some characters are spoken as different syllables depending on the context/meaning. Thus, the number of Chinese characters is probably not a meaningful indicator for the number of occurring syllables. – O. R. Mapper Apr 08 '22 at 20:48
  • 1
    @O. R. Mapper: Yes, I wasn't clear about that. What I meant was that it's probably impractical to try to teach more than a few thousand characters in a writing system, no matter what they represent. People spend years learning just the fraction of Chinese characters needed for basic literacy. The question wasn't entirely clear, but thought it more about how many syllables are possible, not how many are actually used; I'm sure they are very different numbers. – RDBury Apr 08 '22 at 23:51
  • I'm not sure of any specific number, but I'd definitely be interested. My question would be though, whether you make a distinction between "mung" and the normally thought of "-ung" ending. At least generally, I'm not sure that Germans would ever split the word in that way, which simplifies many distinct syllables, e.g. >Strö-mung, Hei-zung, Erkäl-tung, etc to one, '-ung'. – HeWhoHatches Apr 08 '22 at 20:23
  • @HeWhoHatches: Yes, I would assume three different syllables “muŋ”, “tuŋ” and “tßuŋ” here (→ muŋ-go-bo-nøn, tuŋ-ga, tßuŋ-ø). [I have no phonetic skills. This is my own transcription.] “-ung” as an ending is IMHO more a cognitive noun-of-verb-marker than a syllable. Yes, spelling variants would also be conceivable, for example “Zunge” could be “tßuŋ-ø” or “tßu-ŋø”. Spelling variants are another topic. The Maja script has on common syllables up to 15 glyphs for the same syllable. Similarly, the raw text of “Eulenspiegel” has 8 different spellings for “Eulenspiegel” used randomly though the text. – Maron Apr 09 '22 at 19:16

1 Answers1

3

The number of syllables varies drastically from language to language. The Japanese syllabaries, for instance, have as little as 48 base characters (though there are diacritics and special letters), which is very close to the number of signs of the largest alphabets (possibly Devanagari with 47 base characters).

German has an complex phonotactics and an above-average number of vowels, which may add up to a few hundreds of thousands of potential syllables. However, a syllabary for German with a reasonable number of characters (far less than 500) is perfectly possible when you insert vowels in consonant clusters. By that method, «sprichst» might be written as ʃo-po-ri-chi-so-to, or «Markt» as ma-ra-ko-to (for instance). Things could be much improved when you add a few extrasyllabic characters, especially for leading ʃ-, trailing -t and -s, and diacritics for prevocalic -r- and -l-. With these improvements, «sprichst» could be ʃ-pʳi-chi-s-t, «Markt» would be ma-ra-ka-t. Or there could be additional diacritics for vowel offglides (e.g. -w, -j, -n, -r, -l), which would allow writing «Markt» as maʳ-ka-t, or simply a set of standalone consonants (though that feels like cheating to me).

Counting German syllables

I think there are many different ways of counting German syllables. I wonder what method you have used for counting more than 10 million syllables. I am going to outline an attempt at a syllable count.

The number of syllables can be approximated in the following way:

  1. Count the number of syllable onsets.
  2. Count the number of syllable nuclei.
  3. Count the number of syllable codas.
  4. Multiply.

I am counting around 51 onsets: /m ʃm n ʃn ɡn kn p pr pl ʃp ʃpr ʃpl t tr ʃt ʃtr k kr b br bl d dr ɡ ɡr ɡl pf pfr pfl ts tsv tʃ f fr fl s ʃ ʃr ʃl x h ʋ ʋr ʃʋ kʋ j l ʃl r ʃr/ or zero.

The nucleus can have two elements: a core vowel, which can be either of /i e æ y ø u o a/, and an offglide, which can be either of /j w ː l r n/ or zero. Not all combinations are possible, though:

  • After /a/, all offglides are possible, which gives 7 combinations.
  • After /o/, all offglides but /w/ are possible, which gives another 6 combinations.
  • After the other vowels /i e æ y ø u/, only the offglides /ː l r n/ or zero are possible, which gives another 30 combinations.

By this count, the total number of nuclei would be 43.

The codas can be expanded by adding -t, -s, or -st. If you were taking into account contractions such as «wirft’s» or «wirfst’s», you could even add -ts and -sts. As in the nuclei, not all combinations are possible:

  • I am counting 12 codas that occur by themselves or with all three expansions, which gives a total of 48 possibilities: /m n p k b ɡ pf tʃ f x l r/ + /0 t s st/.
  • Another 8 codas can occur by themselves or with one of the expansions, -t in the case of those that end with /s/, -s in the case of those that end with /t/ or /d/. This gives a total of another 16 possibilities: /s ts ks t ft st xt d/ + /0 (s|t)/.

This count gives a total of 64 codas.

Multiplying the onsets, nuclei, and codas, I arrive at around 140,000 potential syllables: 51 * 43 * 64 = 140,352

Note that the method is simplified. Some combinations of nucleus offglide and coda are mutually exclusive, especially the consonantal offglides /l r n/ followed by the same consonant in the coda. Therefore, the actual count should be reduced. On the other hand, I have disregarded zero codas, which are mutually exclusive with zero offglides. Therefore, the actual count should be increased. I hope the two effects cancel out each other approximately, though I do not know for sure.

Note also that the count is fuzzy. I have not disregarded onset clusters that only occur in foreign or regional words. While /ps/ may be reasonably common, /pn/ is rarer, and /kt/ only in a few specialist terms. In southern regional words, many onsets can be prefixed by /k/, and some by /p/. Other fringe phenomena include the diphthong /uj/ that only occurs in a few onomatopeia or the foreign sounds you have mentioned already. Also, schwa syllables could be counted separately, though they have much reduced phonotactics.

mach
  • 7,262
  • 17
  • 32
  • "a syllabary for German with a reasonable number of characters (far less than 500) is perfectly possible when you insert vowels in consonant clusters. (...) a set of standalone consonants (though that feels like cheating to me)" - I am not convinced padding those standalone consontants with vowels that do not appear in the actual language, just to make the consonants look a bit like syllables, counts any less as "cheating". – O. R. Mapper May 14 '22 at 21:57
  • @O.R.Mapper: I guess the reason why including standalone consonants feels like cheating to me is that a syllabary that includes both standalone vowels and standalone consonants is a superset of a regular alphabet, so why not have just the alphabet then? – mach May 22 '22 at 07:57
  • Sure, but my point is that "vowel-padded consonants" are effectively standalone consonants, at best lightly disguised. Whether you write "Trumpf" with standalone consonants attached to a "syllable" "t - rum - p - f", or with vowel-padded consonants, e.g. "ta - rum - pa - fa", makes no difference as the word you're writing is still "Trumpf", not "Tarumpafa". You can pretend "ta" etc. is not a standalone consonant because it's the letter for syllable "ta", but its phonetic value when read in a word is still just "t". Thus, effectively, you're syllabary will again contain standalone consonants. – O. R. Mapper May 22 '22 at 09:50
  • @O.R.Mapper: Well of course I will argue that the fact that a TA syllable sign represents the TA syllable makes it a syllable sign (for TA), no matter whether it can also represent a standalone consonant. When representing standalone consonant, it would probably be interpreted as a syllable with a voiceless vowel, similar to the way Japanese is interpreted. – mach May 22 '22 at 16:54