Questions tagged [q-learning]

For questions related to the Q-learning algorithm, which is a model-free and temporal-difference reinforcement learning algorithm that attempts to approximate the Q function, which is a function that, given a state s and an action a, returns a real number that represents the return (or value) of state s when action a is taken from s. Q-learning was introduced in the PhD thesis "Learning from Delayed Rewards" (1989) by Watkins.

For more info, see e.g. the book Reinforcement Learning: An Introduction (2nd edition) by Sutton and Barto. See also the related Wikipedia article or e.g. http://artint.info/html/ArtInt_265.html

389 questions
3
votes
1 answer

Maximum Q value for new state in Q-Learning never exists

I'm working on implementing a Q-Learning algorithm for a 2 player board game. I encountered what I think may be a problem. When it comes time to update the Q value with the Bellman equation (above), the last part states that for the maximum…
Pete
  • 45
  • 5
2
votes
1 answer

Is Q-learning limited to just visual scenarios, or is it much broader and can it be used to solve non-visual problems as well?

For example, in this article: https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/, which explains Q-learnig, teaches the Smartcab problem, it has a visual environment of a 5x5 grid, where the agent will…
will The J
  • 267
  • 6
1
vote
1 answer

Can Q-learning be used to create new creative solutions by combining different factors and characteristics?

References from Wikipedia: https://en.wikipedia.org/wiki/Q-learning https://en.wikipedia.org/wiki/Markov_decision_process Q-learning can be used to create new creative solutions, combining different behaviors, characteristics, reactions, facts,…
will The J
  • 267
  • 6
1
vote
1 answer

In Q-learning, states need to be just X and Y positions of the agent, or a state can be several other characteristics?

For example, in this article: https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/, which explains Q-learnig, teaches the Smartcab problem, the environment is a 5x5 grid. In this example, states are positions…
will The J
  • 267
  • 6
1
vote
2 answers

In Q-learning, Am I the one who will define the way in which actions allow the agent to interact with the environment? And the interactions will vary?

In Q-learning, am I the one who will define the way in which actions allow the agent to interact with the environment, so that the way in which actions allow the agent to interact with the environment can vary greatly from the problem in…
will The J
  • 267
  • 6
1
vote
1 answer

How can I fetch ​exploration decay rate of an iterable Q-table in Python?

I have done creating the virtual environment, creating the Q-table, initializing the q-parameters, then I made a training module and stored it in a numpy array. After completion of training, I have updated the q-table and now I get the plots for the…
mogoja
  • 73
  • 5
0
votes
0 answers

Biased of multi-step Q Learning for tabular case

This is comes from cs2852023Fall, hw3. I'm learning RL by myself and I cann't find answers related to this question. The resource could be found in https://rail.eecs.berkeley.edu/deeprlcourse/. Backgorund Consider the N-step variant of Q-learning…
yeebo xie
  • 45
  • 5
0
votes
1 answer

How can I update my Q-table in Python?

I want to implement this function on a voice searching application: $$ Q(S, A) \leftarrow Q(S, A)+\alpha\left(R+\gamma Q\left(S^{\prime}, A^{\prime}\right)-Q(S, A)\right) $$ And also restricted to use epsilon-greedy policy based on a given…
mogoja
  • 73
  • 5