1

I try to implement deep reinforcement learning on a defender-vs-attacker problem, where agents can be destroyed by enemies. I am coding both the environment and the RL algorithm. The agent can observe his own state and other's. As we know, the input size of a network should be constant. Here comes the question: how to handle the observation of a dead agent? i.e., if agent1 dies, what should be the agent2's observation of agent1?

My first thought is to set the state to be zero. But I don't think it is a good idea. let's take the state of a agent as: [x, y, theta] and normalize it to [0, 1]. Apparently [0, 0, 0] is also a valid state. If we set the state to be other strange numbers like -1, How can the network realized this is not a normal input of alive agent?

desertnaut
  • 1,044
  • 11
  • 19
zhixin
  • 43
  • 4

2 Answers2

2

A dead agent is a state of that agent, assuming the episode continues.

Using the existing state or observation vector, and adding special values should work, but you could be making extra work for any value function approximator. The usual way to solve this is to expand the observation vector and add e.g. 1 for alive and 0 for dead, as a new feature.

If the agent is also removed from any possible interactions, you should also set other variables to some standard fixed values, because there should be only one observation vector matching a dead agent. Exceptions to that rule might might apply for static features describing a agent in general terms, such team affiliation (knowing it's your own or opposing team member that is out of action is important knowledge for predicting outcomes).

This also has the advantage of supporting extensions to the environment as your agents become more sophisticated. For example, if instead of being removed from the map, visiting a dead agent's location allowed scavenging resources (you may also need more observation features for whether this has been done already).

Neil Slater
  • 32,068
  • 3
  • 43
  • 64
  • What do you mean by "making extra work for any value function approximator"? Do you think expanding the observation vector is better choice than adding special values? – zhixin Sep 28 '23 at 02:13
  • @zhixin I mean that the function shape is usually more difficult to learn if it contains a big change at some special values, where otherwise it would have been simpler and smoother. An NN can learn the shape of e.g. a circle easily, but would find a circle with a single spike on one side slightly harder to learn. – Neil Slater Sep 28 '23 at 06:45
0

Your environment should handle it, not the agent.

If I'm playing a game, and my player dies, then if I press buttons, nothing happens... it's not the player supposed to stop playing, it's the game that is supposed to stop you from doing anything

In other words, the network can still observe whatever is supposed to be seeing, it's the environment and your training algorithm that should take special cautions for such corner case

Alberto
  • 1,798
  • 2
  • 11
  • 1
    I guess you misunderstand my question. I mean how to handle observation of dead agents. Suppose we have 3 agents a, b, c. Agent a's obs is [a's state, b's state, c's state]. If b is destroyed, how should the a's obs be? By the way, the environment is created by myself, so how should the environment give the agent observation when some other agents die. And this is not a corner case, it is expected to see agents destroyed by others. – zhixin Sep 27 '23 at 15:11