0

So I haven't read through every reinforcement learning algorithm, so I am curious. The model-free methods I have read about so far basically just try things and see what works and what doesn't. They then value each thing based on how good they think it is. Then they just choose the things with highest value. This is basically like pressing random buttons in like Pac-man. Then seeing what orders of button works the best given where they are and where the opponents are. Then they just do that. They aren't learning that staying away from the ghosts is what you are supposed to do and pressing buttons that keep their distance far away from the ghosts. I guess they both eventually reach the same result, but I think learning the method is faster. I think this is more important in like the real world. For example race car driving, the ones that I have read would just turn in the direction the corner is turning based on a bunch of stuff. But a person understands their speed and understand what happens from their speed and the outcomes of decisions without having to do them. This allows them to transfer their learning easier and be able to learn things step by step more efficiently.

So is there a reinforcement learning method that learns the method first then starts evaluating stuff and getting actions? And is this method better than current model-free methods that don't try to learn the model?

Edit: I thought about it a bit. Do humans learn how the world works by doing stuff and seeing what it does to their current "state"? So that is kind of what the method is doing. But this doesn't hold up for the pac-man example. A human playing even without knowing the rules can quickly recognize that being touched by the ghost is bad and alter their decision making by simply making sure they are far away from the ghost. However, in my mind I picture that the agent would just go through lots and lots of iterations and decide how to proceed based on a bunch of different exact parameters. However there are thousands maybe even millions of combinations of observations to make a single state. A human can quickly place this newfound model and know how to avoid the ghosts anywhere, and through experience they can quickly fix this function. Humans can also generalize a function and just slightly alter it for another situation.

0 Answers0