Now i try to create the DQN model. During the training process, the action value of each step is different, but most of the time, the same action is always selected. How can i solve it?
Replay memory is 1000, batch size is 32, The learning rate is 0.0025, eplison is 1.0 epsilon decay is 0.98 The discount fatcor is set to 0.98.
activation function for hidden layer is ReLu, and for output layer is linear