Are there any rules of thumb (or actual rules) pertaining to the minimum, maximum and "reasonable" amount of LSTM cells I should use? Specifically I am relating to BasicLSTMCell from TensorFlow and num_units
property.
Please assume that I have a classification problem defined by:
t - number of time steps
n - length of input vector in each time step
m - length of output vector (number of classes)
i - number of training examples
Is it true, for example, that the number of training examples should be larger than:
4*((n+1)*m + m*m)*c
where c
is number of cells? I based this on this: How to calculate the number of parameters of an LSTM network? As I understand, this should give the total number of parameters, which should be less than number of training examples.
In summary they suggest the obvious, that increasing the number of LSTM blocks per hidden layer improves performance but has diminishing returns & increases training time. – CubeBot88 Jun 10 '19 at 12:31