I've got a small handful of competitive word games in progress, and while the preference is for (mostly asynchronous) play against other human opponents, I'd like to provide players the option of playing against an AI. I have my dictionary and I can easily give the AI full dictionary knowledge while it's playing, but my concern is that having the AI regularly playing words they're not familiar with will be a frustrating experience for players: 'I would have won that game if it'd just used words I know!' — even if the AI's overall skill level is turned down.
I'd rather create a weaker AI through a combination of (un)tuned play parameters and a weaker vocabulary — but I'm not sure how to limit that vocabulary to 'common' words. I've looked at several word frequency lists (for instance, the list of all words that appear in the Project Gutenberg books, sorted by number of occurences) but they all have a number of false negatives: words that everyone knows that simply don't show up with any real frequency (for instance, CHEETAH shows up less frequently in the PG texts than VOCATIVE or SUTTEE). I've tried using search results to get estimates of a word's popularity, but they also tend to be prone to spurious mis-estimates, and of course it's hard to get search results for an entire dictionary without running afoul of the terms of service on the search engines.
Does anyone have suggestions on other good means of determining a rough frequency of word usage, or other ways of limiting word game AI that will feel natural to players?
Of course, this would mean you'd need a "AI dictionary" for every player.
– Joel Jul 18 '12 at 06:50