Number of parameters in an RNN

Question

I'm using a basic RNN as in the figure below (say for translation). The model has the following structure:

\begin{aligned} s_t &= \tanh(Ux_t + Ws_{t-1}) \\ o_t &= \mathrm{softmax}(Vs_t) \end{aligned}

Assume that the vocabulary size is $m$ and that of the hidden layer is $n$.
If $x_{t}=\{0,1\}^{m}$ and U is a $ n \times m$ matrix then W is a $ n \times n $ matrix.
If $o_{t}$ is $\mathbb{R}^{k}$ and $s_{t}$ is $\mathbb{R}^{n}$then V is a $ k \times n $ matrix.

What are the # parameters for this RNN model?

score 10 · Answer 1 · edited Apr 20 '17 at 17:59

10

The entities W , U and V are shared by all steps of the RNN and these are the only parameters in the model described in the figure. Hence number of parameters to be learnt while training = $dim(W)+ dim(V)+ dim(U)$.

Based on data in the question this = $ n^{2}+ kn + nm$.

where,

n - dimension of hidden layer
k - dimension of output layer
m - dimension of input layer

edited Apr 20 '17 at 17:59

Gags

3
2

answered Mar 08 '16 at 07:14

wabbit

1,297
2
12
15

I'm not trying to gain reputation by answering my own question. Just wanted to document this somewhere since I found it useful and perhaps someone else will find it useful too. – wabbit Mar 08 '16 at 07:17
It is okay, as long as it is helpful to the users. Please add a more clear and detailed answer :) – Dawny33 Mar 08 '16 at 07:50

score 4 · Answer 2 · answered Oct 01 '20 at 17:52

This is correct if one did not include biases. By including biases ($b_o$ and $b_h$). Number of parameters in $b_o$ is equal to number of outputs (k) and number of parameters in $b_h$ is equal to number of hidden layers (n). Hence the final value is:

$n^2 + n + mn + kn + k$

Number of parameters in an RNN

2 Answers2

Linked