I am using TF Eager to train a stateful RNN (GRU).
I have several variable length time sequences about 1 minute long which I split into windows of length 1s.
In TF Eager, like in Keras, if stateful=True
, "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch." (source)
Thus, how should I design my batches? I obviously cant sample random windows from random sequences. I also cant split a sequence into windows and place adjacent windows in the same batch (e.g. batch 1 = [[seq1 0-1s], [seq 1 1-2s], [seq1 2-3s], ...]
), as the state from the previous window wont get passed to the next window, as is the point of a stateful RNN.
I was thinking of mixing sequences in the same batch as in:
batch 1 = [[seq1 0-1s], [seq2 0-1s], [seq3 0-1s], ...]
batch 2 = [[seq1 1-2s], [seq2 1-2s], [seq3 1-2s], ...]
...
However, there the issue is that the sequences have different length, and thus some will finish before others.
So what is the best way to implement this?
(FYI, I couldn't find anything in the academic literature or blogoshhere which discusses this, so refs would be great)
Thanks!