I'm trying to create a model for when a shipped product will arrive at its destination. There are several stages the delivery goes through, so it's not just drive time from point A to point B. My first model looks at the status of the product on the first stage and uses average time deliveries for that product time to predict the # of minutes after the first stage that the product will be delivered. I want to make another model that gives more of a continuous prediction, taking into account how long the delivery has been in a certain stage. For example, if most deliveries are done with the first stage after 15 minutes and it's been 10 minutes, the model should account for that in the eta. How would I approach this? I can feed in the # of minutes it's been in the stage as an input but it seems like I can come up with a huge amount of data for each example. Sorry if this isn't a clear question.
Asked
Active
Viewed 152 times
3
-
Do you have a fixed number of stages, and it's only when a stage is passed that you know the status and time so far? If yes my first idea would be to do a model for each stage, then sum the predictions of every stage which is not passed yet in order to calculate the ETA. – Erwan Feb 28 '20 at 01:36
-
There are a fixed number of stages. The key requirement I'm getting is that I need the model to be able to provided updated estimations within stages as well. So far I've been working on a model which has a time (minutes after started) increment and outputs the time in each stage. This allows it to change at each minute. – kevin.w.johnson Feb 28 '20 at 12:42
2 Answers
0
If I understand correctly (not sure), it looks to me like you don't need a model which can predict at any time, you just need:
- a model which predicts the ETA at any stage given information about the past stages. The easiest way to do that is probably to just train a different model for each stage, since the number of stages is fixed.
- Then between two stages the ETA can be updated in a deterministic way: if the last stage was passed at time $t$ and the predicted ETA was say 10mn, then at time $t'$ the ETA is just 10mn - $(t'-t)$.

Erwan
- 25,321
- 3
- 14
- 35
0
Often times it makes sense to start by simplifying the problem. In your case, you could frame it as a regression problem - total time from shipping to arrival based on all the available features. Modeling this way is intentionally stateless, ignore sequential stages.
Then start adding complexity. A more complex modeling would use conditional probability based on state, for example probabilistic graphical model (PGM). PGMs are far more difficult to fit compared to linear regression. The marginal gains in performance might not be worth the additional modeling complexity.

Brian Spiering
- 21,136
- 2
- 26
- 109