About batches in stateful RNN

Question

..., to create proper consecutive batches, where the nth input sequence in a batch starts off exactly where the nth input sequence ended in the previous batch.

Géron, Aurélien. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (Kindle Locations 12018-12020). O'Reilly Media. Kindle Edition.

So data for a stateful RNN looks like:

Why not like the below?

Why does stateful use batches like the bottom one? As far as i know, it seems like the bottom one's hidden state has longer memory. (e.g. seq 4, seq 1-2-3, but the top one, seq 4, just seq 3)

Maybe some reasons is there?

score 0 · Accepted Answer · answered Jan 10 '20 at 07:44

It is not a good idea to keep the statefulness like your second figure, because this would prevent the parallelization of the computations.

A stateful RNN (LSTM, GRU, etc) saves the last hidden state and uses it as the initial state for the next batch. If you do it like in your first figure, all sentences in a batch can be computed in parallel. However, for the second figure, you need to compute the last hidden state in sequence 1 to use it for sequence 2, meaning that you need to compute the LSTM for sequence 1 only, then for sequence 2, then sequence 3, and so on, effectively making your batch size 1 and wasting GPU memory.

About batches in stateful RNN

1 Answers1

Linked