How to incorporate action information in the state input of a DQN?

Question

I am working on an RL problem that I am trying to solve using a Deep Q-network. The problem concerns choosing drivers to take specific taxi orders. I am familiar with most of the existing works and that they use RL to determine which orders to take. I specifically look at the situation where we want to determine which drivers to take.

This means that action space concerns the various drivers we can choose. Initially, we assume a fixed number of drivers to ensure a fixed action space.

My question is about defining the state space. First of all, the state space consists of information about the next order we are trying to assign to a driver from our action set. Besides that, we also want to incorporate state information about the different drivers (e.g. their location). However, this would mean we include state information about the actions as input of the DQN. The reason is that the state of the drivers is the main thing that changes when choosing a different action and therefore determines the choice we want to make at the next timestep. I am for example thinking about creating a list of size |drivers| with element i defining the location of agent i.

I tried to find existing work that uses a similar setting (so that incorporates action states in the state input), however, I did not succeed in this yet. Therefore I am wondering:

Is this a logical/reasonable approach to the problem?
If yes, is someone familiar with existing works that use a comparable approach?

I am familiar with works that take (state, action) as input, which describes the full pair of the state s and the action a, and then produce a single Q(s,a) for each specific pair of state + action. This is an approach we do not want to take, given that it leads to |A(s)| passes through the network instead of a single pass (as explained here).

score 1 · Accepted Answer · answered Aug 12 '21 at 15:13

1

Drivers are not actions in this case, they are objects that are part of the state space, your state vector would look something like this \begin{equation} \mathbf{x} = [x_{o}^T, x_1^T,\ldots, x_N^T]^T \end{equation} where $x_{o}$ is a vector that contains information about location (e.g. starting location, destination location, etc.) and $x_i$ is a vector that represents state of driver $i$ (e.g. location of the driver, do they have an active order, etc.). The action in this case would be assigning the order to one of the drivers not the drivers themselves. For instance if you assign the order to driver $1$ the order vector $x_o$ would change and the vector $x_1$ would also change, so you would change your total state $\mathbf x$

answered Aug 12 '21 at 15:13

Brale

2,376
1
6
15

Thanks for your reply! It's a good point that the action is assigning the order to one of the drivers, instead of the drivers themselves. I do grasp that vector x_1 would change, since the state of the driver changes.
In case vector x_0 only includes information about the location, the change to this vector is clear. However, it is preferable to include e.g. the order price in vector x_0. How does then the order vector x_0 change? For each order appearing, we do a forward pass through the network. The next order is therefore not yet known when assigning an order to a driver.
– Stef Aug 12 '21 at 15:30
The way it is posed, the price of the order would not be affected by the action nor would it affect other state variables. It would be a completely random element, it would lead to no benefit (it may even make things worse). If you want to add the price, you would need to relate it to other state variables somehow. For instance adding something like gas tank value and then reducing it with respect to order value and distance passed with respect to location of the order and location of driver. – Brale Aug 12 '21 at 15:49
Fair point! The thing is mainly that we would like to make the action dependent on factors such as the price (assign a lower-value order to a specific driver) or the destination. Also in the case of including the order destination, I foresee problems in including them in the state space. What would then be the value of 'x_0' in the next state? We cannot choose the next order, since that is not dependent on the action.
My main struggle is that we want to incorporate the characteristics of the specific order and base the action on this, but that these cannot be incorporated in a new state.
– Stef Aug 14 '21 at 16:20

How to incorporate action information in the state input of a DQN?

1 Answers1