The act of pushing something off a conveyor is probably best handled by something real-time (microcontroller, PLC, etc). When a box passes a gate and the pusher is armed, fire the piston which pushes. We understand how to fire a piston, no ML needed. The logic for when to arm the pusher could be learned by an RL algorithm.
The complexity of the model will primarily be driven by how Markovian the problem is. A problem is said to be Markovian if the decision can be made by looking at the information in the present situation only (no history needed). For Markovian problems a simple feed-forward network may be enough. The output being a signal either to arm or disarm the pusher.
If the decision depends on history, the decision to push the present box depends on the boxes which have passed before it, then one may consider an LSTM so the network may efficiently integrated the information necessary to solve n-step markov problems or non-markovian problems.
On the off chance one REALLY wanted to do the whole process by ML, construct it like a videogame. Pushing correct packages off gives a reward and incorrect a negative reward. A DQN family algorithm would likely get the gist after several million frames. Predicting future position based on current movement is non-markovian, the original DQN paper used three frames into a convolutional network. LSTM would also work there.