0

Now i try to create the DQN model. During the training process, the action value of each step is different, but most of the time, the same action is always selected. How can i solve it?

Replay memory is 1000, batch size is 32, The learning rate is 0.0025, eplison is 1.0 epsilon decay is 0.98 The discount fatcor is set to 0.98.

activation function for hidden layer is ReLu, and for output layer is linear

  • What sort of action selection strategy are you using? (EGreedy, EGreedy Linear, EGreedy Exp, etc) – krm76 Sep 27 '22 at 11:37
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Sep 27 '22 at 14:31

0 Answers0