I am working on an RL problem that I am trying to solve using a Deep Q-network. The problem concerns choosing drivers to take specific taxi orders. I am familiar with most of the existing works and that they use RL to determine which orders to take. I specifically look at the situation where we want to determine which drivers to take.
This means that action space concerns the various drivers we can choose. Initially, we assume a fixed number of drivers to ensure a fixed action space.
My question is about defining the state space. First of all, the state space consists of information about the next order we are trying to assign to a driver from our action set. Besides that, we also want to incorporate state information about the different drivers (e.g. their location). However, this would mean we include state information about the actions as input of the DQN. The reason is that the state of the drivers is the main thing that changes when choosing a different action and therefore determines the choice we want to make at the next timestep. I am for example thinking about creating a list of size |drivers|
with element i
defining the location of agent i
.
I tried to find existing work that uses a similar setting (so that incorporates action states in the state input), however, I did not succeed in this yet. Therefore I am wondering:
- Is this a logical/reasonable approach to the problem?
- If yes, is someone familiar with existing works that use a comparable approach?
I am familiar with works that take (state, action) as input, which describes the full pair of the state s and the action a, and then produce a single Q(s,a) for each specific pair of state + action. This is an approach we do not want to take, given that it leads to |A(s)|
passes through the network instead of a single pass (as explained here).
x_1
would change, since the state of the driver changes.In case vector
– Stef Aug 12 '21 at 15:30x_0
only includes information about the location, the change to this vector is clear. However, it is preferable to include e.g. the order price in vectorx_0
. How does then the order vectorx_0
change? For each order appearing, we do a forward pass through the network. The next order is therefore not yet known when assigning an order to a driver.My main struggle is that we want to incorporate the characteristics of the specific order and base the action on this, but that these cannot be incorporated in a new state.
– Stef Aug 14 '21 at 16:20