Data preprocessing for time series prediction

Question

I have a dataset that has the following structure

[
 [
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 1
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 2
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 3
                               :
 ],
 [
  [ product 2 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 1
  [ product 2 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 2
  [ product 2 ,shelf number, position on the tray, time of stay on the shelf, was sold?], # Hour 3
                              :
 ],
                              :
]

My goal is to predict for a newer product say product_n predict if it will be sold (3 hours earlier).

My question is how do I process it for a Recurrent Neural network since the vector of prediction was sold? is available for each hour.

To say that in detail, since

[
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 1
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 2
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 3
                               :
 ],

is one observation for the RNN how do I assign was sold? to it? Since len(X) should be equal to len(y)

was sold? is available for each observations, do I take max for 3 hours and asssign it to the obervation?

Like

X = [
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 1
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 2
  [ product 1 ,shelf number, position on the tray, time of stay on the shelf], # Hour 3
                               :
 ],
and 
y = [max(was sold?)]

score 0 · Answer 1 · answered Jun 28 '20 at 10:34

So the question is about how to represent this dataset so you can use this your classification task.

Firstly, we need to group the features for each product, if you are classifying on a product-by-product basis.

Secondly, a couple of things:

For categorical features (e.g. specific products), we normally represent these as one-hot encoded vectors (https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/).
For numerical features (e.g. time of stay on the shelf [minutes/hours, etc.]) we can represent these as the value themselves.

Then, for the input into your model, then you simply concatenate these features together to form a "long" n-dimensional vector.

Data preprocessing for time series prediction

1 Answers1