Number of parameters in an LSTM model

Question

How many parameters does a single stacked LSTM have? The number of parameters imposes a lower bound on the number of training examples required and also influences the training time. Hence knowing the number of parameters is useful for training models using LSTMs.

score 43 · Answer 1 · edited Oct 20 '18 at 12:11

43

The LSTM has a set of 2 matrices: U and W for each of the (3) gates. The (.) in the diagram indicates multiplication of these matrices with the input $x$ and output $h$.

U has dimensions $n \times m$
W has dimensions $n \times n$
there is a different set of these matrices for each of the three gates(like $U_{forget}$ for the forget gate etc.)
there is another set of these matrices for updating the cell state S
on top of the mentioned matrices, you need to count the biases (not in the picture)

Hence total # parameters = $4(nm+n^{2} + n)$

edited Oct 20 '18 at 12:11

Escachator

631
7
14

answered Mar 09 '16 at 11:15

wabbit

1,297
2
12
15

2

I faced this question myself when taking practical decisions on estimating hardware requirements and project planning for a deep learning project.
PS: I didn't answer my own question to just gain reputation points. I want to know if my answer is right from the community. – wabbit Mar 09 '16 at 11:17
1

You have ignored bias units. See Adam Oudad's answer below. – arun Jun 20 '18 at 00:11
1

Biases are not there. I have edited the answer. – Escachator Oct 20 '18 at 12:09
Doesn't this then need to be multiplied by the number of lstm units in the layer? Here isn't this only the number of params in a single LSTM-cell? – Joe Black May 27 '20 at 18:03

score 30 · Answer 2 · edited Jan 26 '18 at 14:34

30

Following previous answers, The number of parameters of LSTM, taking input vectors of size $m$ and giving output vectors of size $n$ is:

$$4(nm+n^2)$$

However in case your LSTM includes bias vectors, (this is the default in keras for example), the number becomes:

$$4(nm+n^2 + n)$$

edited Jan 26 '18 at 14:34

Stephen Rauch

1,783
11
22
34

answered Jan 26 '18 at 07:08

Adam Oudad

1,083
7
10

3

This is the only complete answer. Every other answer appears content to ignore the case of bias neurons. – Feb 07 '18 at 14:11
2

To give a concrete example, if your input has m=25 dimensions and you use an LSTM layer with n=100 units, then number of params = 4(10025 + 100**2 + 100) = 50400. – arun Jun 20 '18 at 00:13
1

Suppose I am using timestep data, is my understanding below correct? n=100: mean I will have 100 timestep in each sample(example) so I need 100 units. m=25 mean at each timestep, I have 25 features like [weight, height, age ...]. – jason zhang Mar 10 '19 at 06:41
4

@jasonzhang The number of timesteps is not relevant, because the same LSTM cell will be applied recursively to your input vectors (one vector for each timestep). what arun called "units" is also the size of each output vector, not the number of timesteps. – Adam Oudad Mar 11 '19 at 08:24

ichernob · Answer 3 · 2017-03-08T19:37:33.507

According to this:

LSTM cell structure

LSTM equations

Ingoring non-linearities

If the input x_t is of size n×1, and there are d memory cells, then the size of each of W∗ and U∗ is d×n, and d×d resp. The size of W will then be 4d×(n+d). Note that each one of the dd memory cells has its own weights W∗ and U∗, and that the only time memory cell values are shared with other LSTM units is during the product with U∗.

Thanks to Arun Mallya for great presentation.

Ali Alipoury · Answer 4 · 2020-07-28T05:45:56.057

to completely receive you'r answer and to have a good insight visit : https://towardsdatascience.com/counting-no-of-parameters-in-deep-learning-models-by-hand-8f1716241889

g, no. of FFNNs in a unit (RNN has 1, GRU has 3, LSTM has 4)

h, size of hidden units

i, dimension/size of input

Since every FFNN(feed forward neural network) has h(h+i) + h parameters, we have

num_params = g × [h(h+i) + h]

Example 2.1: LSTM with 2 hidden units and input dimension 3.

g = 4 (LSTM has 4 FFNNs)

h = 2

i = 3

num_params

= g × [h(h+i) + h]

= 4 × [2(2+3) + 2]

= 48

    input = Input((None, 3))
    lstm = LSTM(2)(input)
    model = Model(input, lstm)

thanks to RAIMI KARIM

Ben2018 · Answer 5 · 2019-12-18T17:32:05.820

To make it clearer , I annotate the diagram from http://colah.github.io/posts/2015-08-Understanding-LSTMs/.

ot-1 : previous output , dimension , n (to be exact, last dimension's units is n )

i: input , dimension , m

fg: forget gate

ig: input gate

update: update gate

og: output gate

Since at each gate, the dimension is n, so for ot-1 and i to get to each gate by matrix multiplication(dot product), need nn+mn parameters, plus n bias .so total is 4(nn+mn+n).

Number of parameters in an LSTM model

5 Answers5

Linked