4

I found many databases with most of the german words, but non of them contained the pronunciation of the words. (In Phonetic spelling) Something like:

Hund - Húnd OLivenbaum - oˈliːvn̩ba͜um

would be great.

Wiktionary is also not suffient, as it can not be easily parsed by a computer. Best case would be if the dataset would be free, but it does not have to be.

LenzAbi
  • 45
  • 3

2 Answers2

10

Wiktionary is also not suffient, as it can not be easily parsed by a computer.

Umm, I beg to differ. It's a one-liner.

$ curl 2>/dev/null -o- https://de.wiktionary.org/wiki/Hund|grep -m1 'IPA'|sed 's/<\/span>\]<\/dd>//;s/.*>//'
hʊnt

$ curl 2>/dev/null -o- https://de.wiktionary.org/wiki/Olivenbaum|grep -m1 'IPA'|sed 's/</span>]</dd>//;s/.*>//' oˈliːvn̩ˌbaʊ̯m

If you don't know how to do it yourself from that example, give me your word list and I let my computer do it for you.

Janka
  • 60,148
  • 2
  • 63
  • 119
  • Good answer, but wrong SE :D – infinitezero Dec 10 '22 at 16:06
  • Thanks for your answer! My current word list has approximately 1.2 Million entries. Is there a possibility of getting Ip-banned from wiktionary if i generate too many requests? – LenzAbi Dec 11 '22 at 16:09
  • But seeing that the required information can be extracted by just one line of code, makes me think it is maybe better just to download the wiktonary dump, to avoid sending to many requests. – LenzAbi Dec 11 '22 at 16:12
  • The dump is in the Mediawiki format, it's even easier to parse. – Janka Dec 12 '22 at 02:23
  • @Janka Die Frage wurde hier noch einmal auf Deutsch gestellt, falls Du deine Antwort übersetzen möchtest. – xyldke Dec 12 '22 at 15:29
4

Yes, there is:

The Deutsche Aussprachedatenbank ('German pronounciation database', short DAD) is a free scientific online database by the University of Halle-Wittenberg with over 130.000 entries based on the Deutsches Aussprachewörterbuch.

It contains your examples:

amadeusamadeus
  • 7,722
  • 19
  • 42
  • $ curl 2>/dev/null -o- https://dad.sprechwiss.uni-halle.de/dokuwiki/doku.php/de/O/Olivenbaum|grep -m1 "de.ipa"|sed 's/\]<\/td>//;s/.*>\[//' if anyone wants to harvest that. – Janka Dec 14 '22 at 19:14