I want to compare three different types of RNNs to decide which architecture can handle my data best.
To do that, I want them to have the same complexity. Can I simply define the complexity by the number of trainable parameters? If not, why?
Suppose I can compute the network's complexity like this, how do I correctly compute the complexity of a vanilla RNN cell, a GRU cell and an LSTM cell?