1

I have done creating the virtual environment, creating the Q-table, initializing the q-parameters, then I made a training module and stored it in a numpy array. After completion of training, I have updated the q-table and now I get the plots for the explorations But how can I code for rate decay? Here is my sample code for every step of the training module,

for step in range(max_steps): 
        exploration_rate_threshold = random.uniform(0,1)
    if exploration_rate_threshold > exploration_rate:
        action = np.argmax(q_table[state,:])
    else:
        action = env.action_space.sample()

Milan
  • 113
  • 1
  • 6
mogoja
  • 73
  • 5

1 Answers1

1

Here is one way to calculate the exploration rate decay:

exploration_rate = min_exploration_rate + \ (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate*episode)
Rithik Banerjee
  • 161
  • 1
  • 5