I try to implement deep reinforcement learning on a defender-vs-attacker problem, where agents can be destroyed by enemies. I am coding both the environment and the RL algorithm. The agent can observe his own state and other's. As we know, the input size of a network should be constant. Here comes the question: how to handle the observation of a dead agent? i.e., if agent1 dies, what should be the agent2's observation of agent1?
My first thought is to set the state to be zero. But I don't think it is a good idea. let's take the state of a agent as: [x, y, theta] and normalize it to [0, 1]. Apparently [0, 0, 0] is also a valid state. If we set the state to be other strange numbers like -1, How can the network realized this is not a normal input of alive agent?