I was following a report on the recovery of the Maya script. One step in this recovery process was the conclusion, that the writing with up to 500 glyphs had to be a syllabary. That got me thinking about how many phonetically different syllables the German language has. How many characters would you need for a German syllabary?
We have simple syllables like “o” (1 vowel) and rather complicated syllables like “streichst” (3 consonants (fricative—plosive—fricative) + 2 vowels + another 3 consonants (fricative—fricative—plosive), where in the coda, fricative—plosive—fricative does also occur). We have typical syllables like “ma” and more unusual ones like “mungs” (as in Strömungsmechanik) or even “chur” (as in Fuchur, dragon character from Michael Ende’s “Die Unendliche Geschichte”). I've tried to set up and quantify rough rules about how syllables can be formed, based on the sounds used in German (no English “th”, no French diphtongs etc.) and calculated over 10 million possibilities (many of which shouldn't be syllables currently actually occurring in the German language). However, 500 seems very little to me.
Are there any numbers?
I tried the following: First I transcribed a few sentences from the German language into a kind of loudscript. (I don't know any phonetic script, and I didn't want to reproduce all the acoustic details, but rather wanted to proceed in the same way as a speaker of the language who thought up a script might have done it in his mind many centuries ago.) Then I used a small program to determine how many syllables per 20 syllables were new, i.e. had not yet appeared in the previous text.
static String text = "däa be.grif ho.lo.do.moa u.kra.ii.ni§ töö.tuñ dur¤ hu.ña §teet füa deen tail däa hu.ñaß.noot in däa sø.wjäät.uun.joon in deen 9.10.100.3.ßig.äa jaa.rän in där u.kra.ii.ni.§ön so.tßja.lis.ti.§ön sø.wjäät.ree.puu.blik* in dii.säa uun.joonß.ree.puu.blik fii.län deem hu.ña §ää.tßuñß.wai.sö 3 biß 7.1000000 män.§ön tsum ø.pfa* dii u.kra.ii.nö bee.müüt si¤ sait däa un.ab.häñ.i¤.kait 19.100.1.unt.90 um ai.nö in.täa.na.tsjoo.naa.lö an.äa.kän.nuñ däß ho.lo.do.moaß alß föl.ka.moat* dii.sö bee.wea.tuñ ißt um.§tri.tön unt wiat isß.bee.son.dä.rø fon dea ree.gii.ruñ ruß.lndß kri.ti.siat* in.tea.na.tjo.naal fin.dät sii a.ba im.ma mea tsu.§tim.muñ*\n"
+ "gil.ti plä.¥a døit§ ät.wa sün.dii.gäß oo.da §ult.bee.wuß.täß fäa.gnüü.gän ißt ain äñ.li§.§praa.xii.gäa pøp.ßoñ* mit deem tii.töl fea.trat kro.aa.tßi.en baim øi.ro.wi.¥ön ßoñ.kon.täst 20.2.unt.20 in tuu.riin*\n"
+ "ain §iam.maa.xa in äl.tä.rän bee.tsaich.nu.ñön aux um.bäl.laa.rii.uß paa.raa.plui.maa.xa oo.da paa.raa.ßøl.maa.xa änt.wirft unt fäa.tikt §ia.mö* dea an.äa.kan.tö auß.bil.duñß.be.ruuf wirt in dea gru.pä dea hølts.hant.wäa.ka gee.füat op.wool in ree.gen o.da so.nøn.§ia.mön läñßt fii.lö mee.tal unt kunßt.§tøf.tai.lö fer.wän.döt wea.dön* dii §iam.maa.xa gee.höö.rön ai.nöm mit.la.wai.lö säl.tö.nön bee.rufs.tswaig an*\n"
+ "daß sü.ri§ oa.too.do.kße kloo.ßta moa ja.kop tsa.lee ißt ai.nös dea el.täß.tön krißt.li.chön klöö.ßta dea wält* eß ilkt im tuur ab.diin in dea süüd.øßt.tüa.kai im døaf saa.lee unt ißt ai.nöß dea klöö.ßta dea süü.ri§.øa.too.dø.ksön kia.¤ö fon an.ti.ø.xi.ön* auf.grunt sai.na ään.li¤.kait mit deem kloo.ßta moa gaa.brii.äl wirt dii glai.xö ent.§tee.uñkß.tßait 6.100 naax kriß.tuß fäa.muu.töt* äß wirt tßu.deem fäa.muu.töt daß dii nöat.lix däß kloo.ßtaß gee.lee.gö.nö an.laa.gö auß §tain.kwaa.dan dii räß.tö ai.nöß hait.ni.§ön kult.bauß bee.hea.beakt* 2 böö.gön dii.sa an.laa.gö sint hoi.tö nox äa.ken.baa*\n"
+ "d¥uñ.kö bee.tsai.xnöt ai.nö fiil.tsaal 1 o.da mea.maß.ti.ga see.göl.§if.tüü.pön traa.di.tsjo.näl.la bau.aat in ¤ii.na* dii d¥uñ.kön faa.rön altß han.döls laß.tön o.da fi.§ä.rai.§i.fö auf deen ¤ii.nee.si.§ön flü.ßön deen küß.tön.gee.wä.ßan unt dea hox.see* oft.malß wea.den sie alß hauß.boo.tö gee.nutßt* dii gröö.ßä.rön d¥uñ.kön ha.bön ain fa.ßuñkß.fea.möö.gön fon 4.100 biß 5.100 ree.giß.ta.ton.nön*\n"
+ "dii höö.fa.§pi.tßä ißt ain 2.1000.100.1.unt.3.ßig mee.ta üü.bea deem a.dri.aa.ti.§ön mee.röß.§pii.göl hoo.a bäak in deen al.goi.ja al.pön*\n"
+ "roo.lant buu.ti ai.gänt.lix roo.lant büü.tii.koo.fa ißt ain §wai.tßa §rift.stäl.la unt lee.ra*\n"
+ "sül.fään von waa.lii.si§ fun.daa.mänt ißt ai.nö see.rii.fön.§rift füa mee.rä.rö §rift.süß.tee.mö die 19.100.8.unt.9.tßig fon d¥øn hat.ßön da.böl.juu roß milß unt ¥ee.ral.diin wäjt füa mai.kro.ßoft änt.wi.kält wua.dö* 19.100.7.unt.9.tßig §täl.tö mai.kro.ßoft den auf.traak ai.nö §rift.aat füa mee.rö.rö §rift.süß.tee.mö tßuu änt.wäa.fön* dii fäa.ti.gö §rift.aat ent.hält 3.1000.8.100.2.und.4.tßig §rift.tßai.¤ön unt un.ta.§tütßt dii la.tai.ni.§ö küü.ril.li.§ö grii.¤i.§ö a.mee.ni.§ö gee.øa.gi.§ö unt dii ää.ti.oo.pi.§ö §rift*\n"
+ "nuu.wäl ree.wüü froñ.ßääß ißt ai.nö fran.zöö.si.§ö li.tä.ra.tua.tßait.§rift dii sait deem 1.tön fee.bruu.aa 19.100.9 mit un.ta.brä.¤u.ñön wää.rönt dea bai.dön wält.krii.gö unt nax deem 2.tön wält.kriik biß hoi.tö ea.§aint* sii wirt fom fea.laak ga.lii.maar hea.rauß.gö.gee.bön*";
public static void main(String[] args) {
List<String> syllables = Arrays.asList(text.split("[ \\.\n*]"));
Set<String> isNew = new HashSet<>();
int countNew = 0;
for (int listIndex = 1; listIndex < syllables.size(); listIndex++) {
String syllable = syllables.get(listIndex);
if (isNew.add(syllable)) countNew++;
if (listIndex % 20 == 0) {
System.out.println(listIndex + ":" + countNew);
countNew = 0;
}
}
}
As expected, the number goes down, and it has to go down to 0 at some point for an infinite amount of meaningful German text.
In the second step, I had the graph calculate a function and let it continue over more text. You can see here (coulums D24:F507):
This leads me to roughly 1.500 sylables. Of course this idea could be made more precise with more text, but to be honest I would have expected a lot more syllables!