Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input?

Question

While reading the DQN paper, I found that randomly selecting and learning samples reduced divergence in RL using a non-linear function approximator (e.g a neural network).

So, why does Reinforcement Learning using a non-linear function approximator diverge when using strongly correlated data as input?

Read chapter 11 of this book. This is only a draft, if you can find the full book even better. Also, I think similar questions were answered already so try searching a bit through the website. — Brale, Feb 11 '20 at 08:19
Maybe this is a duplicate of Why doesn't Q-learning converge when using function approximation?. — nbro, Feb 11 '20 at 16:05

score 2 · Answer 1 · answered Jul 23 '20 at 14:01

It is not so much the problem of using Reinforcement Learning to train the neural networks, it is the assumptions made about the data given to standard Neural Networks. They are not capable of handling strongly correlated data which is one of the motivations for introducing Recurrent Neural Networks, as they can handle this correlated data well.

Why does reinforcement learning using a non-linear function approximator diverge when using strongly correlated data as input?

1 Answers1