How can I update my Q-table in Python?

Question

I want to implement this function on a voice searching application:

$$ Q(S, A) \leftarrow Q(S, A)+\alpha\left(R+\gamma Q\left(S^{\prime}, A^{\prime}\right)-Q(S, A)\right) $$

And also restricted to use epsilon-greedy policy based on a given Q-function and epsilon. I simply need a $\epsilon$-greedy policy for updating my q-table.

At-least show what have you tried? – Rithik Banerjee Jul 29 '20 at 16:00 — Rithik Banerjee, Jul 29 '20 at 16:00

Rithik Banerjee · Accepted Answer · 2020-07-29T16:20:27.873

Just try returning a function that takes the state as an input and returns the probabilities for each action in the form of a numpy array of length of the action space (set of possible actions). Here, is one attempt:

def EpsilonGreedyPolicy(Q, epsilon, no_of_actions):
def policy(state): 
    probabilities = np.ones(no_of_actions, dtype = float) * 
                epsilon / num_actions 
    best_action = np.argmax(Q[state]) 
    probabilities[best_action] += (1.0 - epsilon) 
    return probabilities 

return policy

How can I update my Q-table in Python?

1 Answers1