This may be a very fundamental question, but somehow I can't decide.
I have a graph and the user can take several actions while traversing it and there are multiple points with rewards. When I execute the MDP process, it ends up finding the first (the one it reaches first) target repeatedly and cannot find the others. So, now I am removing the reward value of that node once its reward is reached. Is it a correct approach? If not, what should I do instead of that?
Thanks in advance, Kind Regards, Ferda.