2

This may be a very fundamental question, but somehow I can't decide.

I have a graph and the user can take several actions while traversing it and there are multiple points with rewards. When I execute the MDP process, it ends up finding the first (the one it reaches first) target repeatedly and cannot find the others. So, now I am removing the reward value of that node once its reward is reached. Is it a correct approach? If not, what should I do instead of that?

Thanks in advance, Kind Regards, Ferda.

coder
  • 17
  • 6

1 Answers1

3

So, now I am removing the reward value of that node once its reward is reached. Is it a correct approach? If not, what should I do instead of that?

Partially correct. For the environment to continue making sense to the agent (and be a valid MDP), you need to set a state variable to represent this change. A list of booleans "goals reached so far" would be one simple way to do it. Note that each such goal state that toggles between giving a reward or not depending on history doubles the state space.

Neil Slater
  • 32,068
  • 3
  • 43
  • 64
  • Dear Neil, thanks for the response. I already had the impact_reached booleans for the nodes, so it is ok. I also have other related questions, but I will ask them separetely.. Thanks again – Ferda-Ozdemir-Sonmez Jul 18 '22 at 10:43
  • Dear Neil, I have another question related to this on this link https://ai.stackexchange.com/questions/36381/markov-decision-process-how-to-get-the-correct-policy-if-targets-are-reached-onc Can you please check that one if you can? – Ferda-Ozdemir-Sonmez Jul 19 '22 at 13:21