Which RL algorithms can be used in an environment where actions have to be performed only in specific situations?

Question

I am wondering which RL algorithms can be used in an environment where actions have to be performed only in specific situations. For example, on a conveyor belt on which a box that fulfills certain conditions must be sorted out. A signal must then be sent at the right time to sort it out correctly. If the signal comes too early or too late, it would be wrong.

Is it only the reward function that enables the agent to learn that or do I need to consider some special neural networks such as LSTM or do I need to consider certain RL algorithms?

Does somebody have some experience in this kind of scenario and can give suggestions?

Do you want the agent to learn both what signal to send and the timing at which to send it? Or do you just want to guarantee that the algorithm has known complexity so you can make sure that the signal is sent at the right time (i.e., the algorithm will take at most X ms to process its input and update its actions based on the last feedback, so it will be ready in time for the next item on the conveyor)? — Eponymous, Jan 05 '24 at 12:14
Also, (from the "related" tab), consider PID. https://ai.stackexchange.com/q/12472/61494. When to take an action sounds like the kind of well-defined problem suited to PID. With a simple memory that recalls which action was chosen, you could decompose the system into "Which action should I take" and "Did I take action successfully?" You can use these to send feedback to the "action chooser" (RL) and individual "action taker" (PID) controllers. Things are a bit more complex if the action to take depends on how much time will be required, but I think still decomposable. — Eponymous, Jan 05 '24 at 12:35

foreverska · Answer 1 · 2024-01-07T20:33:56.027

The act of pushing something off a conveyor is probably best handled by something real-time (microcontroller, PLC, etc). When a box passes a gate and the pusher is armed, fire the piston which pushes. We understand how to fire a piston, no ML needed. The logic for when to arm the pusher could be learned by an RL algorithm.

The complexity of the model will primarily be driven by how Markovian the problem is. A problem is said to be Markovian if the decision can be made by looking at the information in the present situation only (no history needed). For Markovian problems a simple feed-forward network may be enough. The output being a signal either to arm or disarm the pusher.

If the decision depends on history, the decision to push the present box depends on the boxes which have passed before it, then one may consider an LSTM so the network may efficiently integrated the information necessary to solve n-step markov problems or non-markovian problems.

On the off chance one REALLY wanted to do the whole process by ML, construct it like a videogame. Pushing correct packages off gives a reward and incorrect a negative reward. A DQN family algorithm would likely get the gist after several million frames. Predicting future position based on current movement is non-markovian, the original DQN paper used three frames into a convolutional network. LSTM would also work there.

Which RL algorithms can be used in an environment where actions have to be performed only in specific situations?

1 Answers1