GRU/LSTM models - Train/Test split

Question

I drove myself into a corner with this, can someone please explain?

I feel I'm missing something obvious...

If, for LSTM, each layer is trained with inputs from t and t-1, than that'd mean that if I've got a training set of a 10 000 observations, the network is trained to get 10 000 observations and produce a result as a function of all of them. If I use it on a test set of say 1 000 observations, why would it work?

Or if I want to make a prediction, from a single observation, whz would that work at all?

Should, in the case of LSTMs, the train test (in the toy example above) be 10 000 observations long (i.e. 9 000 old 'train' observations and 1 000 new 'test' ones)?

Hello HandyAndy, can you clarify which kind of dataset you are using? Is it a time series? — TQA, Jan 16 '19 at 10:29

score 1 · Answer 1 · answered Jan 16 '19 at 10:37

LSTM is often used for Sequence Prediction problems, for example, when the dataset is a time series.

In that kind of dataset, you don't split the dataset the normal (a.k.a random) way to avoid look-ahead bias. We should train with "old observations" and test with "new observations" as you said.

Should, in the case of LSTMs, the train test (in the toy example above) be 10 000 observations long (i.e. 9 000 old 'train' observations and 1 000 new 'test' ones)?

This sentence is a little bit unclear to me. If you have a dataset with 10 000 observations, you can train with 9000 old observations and test with 1000 new observations. The training set shouldn't contain the test data in any case.

Here's an example of how to split a time series dataset: We have data from 01-01-2000 to 31-12-2004, we can choose 01-01-2004 as a split day:

Training set: data from 01-01-2000 to 31-12-2003
Test set: data from 01-01-2004 to 31-12-2004

In this case, no knowledge from the future is used in the training phase and we can use the model to predict the data in the test set.

If I have 5 years of data, why would it be wrong to use 1st and last year of data as test and the middle three years as training. If I prepare the batches from 1st and last year separately to avoid overlapping of data from 1st and last year in a single batch then how it can add bias? — Ather Cheema, Jun 08 '20 at 00:57

GRU/LSTM models - Train/Test split

1 Answers1