I want to implement this function on a voice searching application:
$$ Q(S, A) \leftarrow Q(S, A)+\alpha\left(R+\gamma Q\left(S^{\prime}, A^{\prime}\right)-Q(S, A)\right) $$
And also restricted to use epsilon-greedy policy based on a given Q-function and epsilon. I simply need a $\epsilon$-greedy policy for updating my q-table.