Role of stateful parameter vs shuffle parameter in LSTM keras

Question

I'm trying to make prediction on a multivariate time series using LSTM. I know stateful=True in keras LSTM means state(hidden) of each sequence, in a batch, at index i - is passed to the next batch, to a sequence on the same index. This ensures 'long term' dependancy. In contrast , stateful=False will reset the state on seeing each sequence. There will be no concept of batches here.This is my source:http://philipperemy.github.io/keras-stateful-lstm/

I had these further doubts though:

when we say reset states, we mean hidden ones or cell states?
if I set stateful=False , what's the difference keeping shuffle=True vs shuffle=False in this context?
similarly, is it not recommended to keep shuffle = True while using stateful LSTM? I assume it's bad practice because we need states to pass at same index across every batch, and shuffling defeats the purpose

noe · Accepted Answer · 2023-10-23T10:37:00.383

When we say reset states, we mean hidden ones or cell states?

We mean both the initial cell AND hidden state.
If I set stateful=False , what's the difference keeping shuffle=True vs shuffle=False in this context?

Argument shuffle in the invocation to model.fit controls whether data is shuffled before using it to train the model. If we have stateful=True we need shuffle=False, to keep the sequences contiguous between successive batches, as shown in figure below (source):

However, if we have stateful=False we must choose an appropriate value for shuffle according to our task and data. If we don't shuffle, the result of the training will depend on how the data was ordered at its source. For instance, if the data is sorted by sequence length from longer to shorter sequences, at the end of each epoch, the model will probably show a bias toward short sequences, because that's what it has seen most recently. Normally, we should always use shuffle=True unless there is an specific reason not to (e.g. the data was already sorted according some criteria that will help the model avoid some bias that would probably be present on the data with random shuffling).
Similarly, is it not recommended to keep shuffle = True while using stateful LSTM? I assume it's bad practice because we need states to pass at same index across every batch, and shuffling defeats the purpose.

As you can see from figure above, using shuffle=True would prevent the model to have consecutive sequences at the same position of consecutive batches. If we don't have that, having stateful LSTMs would just be detrimental to learning to interpret long sequences, because the sequences seen during training would not be consecutive but random. Therefore, having stateful=True and shuffle=True leads to feeding garbage initial states to the LSTM. This does not mean that the model will not learn. Neural nets are surprisingly good at handling garbage input data and generating good inputs. However, the model will probably perform better on long sequences if using shuffle=False.

You may also find useful other questions about stateful LSTM here at datascience SE, e.g. this and this.

Role of stateful parameter vs shuffle parameter in LSTM keras

1 Answers1