How do we create a good agent that does not outperform humans?

Question

A lot of research has been done to create the optimal (or "smartest") RL agent, using methods such as A2C. An agent can now beat humans at playing Go, Chess, Poker, Atari Games, DOTA, etc. But I think these kind of agents will never be a friend of humans, because humans won't play with a agent that always beats them.

How could we create an agent that doesn't outperform humans, but it has the human level skill, so that when it plays agains a human, the human is still motivated to beat it?

score 3 · Accepted Answer · answered Feb 07 '19 at 09:20

You basically have to degrade the result, assuming that the machine always finds the best move. There are a number of possibilities:

restrict the depth of searching. In early chess programs I believe that was the main way of regulating the difficulty. You stop the evaluation of moves after a particular depth in your search tree has been reached. This would be equivalent to only looking ahead two moves instead of twenty.
set a time limit. This is somewhat similar the restricting the depth of the search, but more generally applicable. If your algorithm accumulates candidate moves, and the general tendency is to get to the better moves after first finding a number of weaker ones, then you can stop at a given point in time and return what you have found then.
distort available information. This might not be that applicable to games such a chess, but you could restrict the information the machine has available for evaluating moves. Something like the "Fog of War" often used in strategy games. With incomplete information it is harder to find a good move, though it is not impossible, which makes it more challenging than, say, restricting the depth of search too much.
sub-optimal evaluation function. If you have a function that evaluates the quality of a move, simply fudge that function to not return the best value. Perhaps add a random offset to the return value to make it less deterministic/predictable.

There are probably other methods as well; the tricky part is to tread the fine line between appearing to be a weaker (but consistent) player, and just being a random number generator.

I think you can add the main problem detail in the "tricky" part: There is a problem of measuring capability against humans. It is not easy to automate, because if you can automate a human-level opponent to test with then you have already solved your problem! So it is a slow process of testing the bot with human subjects to see if it is enjoyable to play against. — Neil Slater, Feb 07 '19 at 10:27
@NeilSlater Yes, agree 100%. Enjoyment is hard to quantify... — Oliver Mason, Feb 07 '19 at 13:42

How do we create a good agent that does not outperform humans?

1 Answers1

Linked