Remove already reached targets from the system to enable reaching other targets?

Question

This may be a very fundamental question, but somehow I can't decide.

I have a graph and the user can take several actions while traversing it and there are multiple points with rewards. When I execute the MDP process, it ends up finding the first (the one it reaches first) target repeatedly and cannot find the others. So, now I am removing the reward value of that node once its reward is reached. Is it a correct approach? If not, what should I do instead of that?

Thanks in advance, Kind Regards, Ferda.

score 3 · Accepted Answer · answered Jul 18 '22 at 10:16

3

So, now I am removing the reward value of that node once its reward is reached. Is it a correct approach? If not, what should I do instead of that?

Partially correct. For the environment to continue making sense to the agent (and be a valid MDP), you need to set a state variable to represent this change. A list of booleans "goals reached so far" would be one simple way to do it. Note that each such goal state that toggles between giving a reward or not depending on history doubles the state space.

answered Jul 18 '22 at 10:16

Neil Slater

32,068
3
43
64

Dear Neil, thanks for the response. I already had the impact_reached booleans for the nodes, so it is ok. I also have other related questions, but I will ask them separetely.. Thanks again – Ferda-Ozdemir-Sonmez Jul 18 '22 at 10:43
Dear Neil, I have another question related to this on this link https://ai.stackexchange.com/questions/36381/markov-decision-process-how-to-get-the-correct-policy-if-targets-are-reached-onc Can you please check that one if you can? – Ferda-Ozdemir-Sonmez Jul 19 '22 at 13:21

Remove already reached targets from the system to enable reaching other targets?

1 Answers1