7

I'm using a basic RNN as in the figure below (say for translation). The model has the following structure:

\begin{aligned} s_t &= \tanh(Ux_t + Ws_{t-1}) \\ o_t &= \mathrm{softmax}(Vs_t) \end{aligned}

  • Assume that the vocabulary size is $m$ and that of the hidden layer is $n$.
  • If $x_{t}=\{0,1\}^{m}$ and U is a $ n \times m$ matrix then W is a $ n \times n $ matrix.
  • If $o_{t}$ is $\mathbb{R}^{k}$ and $s_{t}$ is $\mathbb{R}^{n}$then V is a $ k \times n $ matrix.

What are the # parameters for this RNN model?

a basic RNN

wabbit
  • 1,297
  • 2
  • 12
  • 15

2 Answers2

10

The entities W , U and V are shared by all steps of the RNN and these are the only parameters in the model described in the figure. Hence number of parameters to be learnt while training = $dim(W)+ dim(V)+ dim(U)$.

Based on data in the question this = $ n^{2}+ kn + nm$.

where,

  • n - dimension of hidden layer
  • k - dimension of output layer
  • m - dimension of input layer
Gags
  • 3
  • 2
wabbit
  • 1,297
  • 2
  • 12
  • 15
  • I'm not trying to gain reputation by answering my own question. Just wanted to document this somewhere since I found it useful and perhaps someone else will find it useful too. – wabbit Mar 08 '16 at 07:17
  • It is okay, as long as it is helpful to the users. Please add a more clear and detailed answer :) – Dawny33 Mar 08 '16 at 07:50
4

This is correct if one did not include biases. By including biases ($b_o$ and $b_h$). Number of parameters in $b_o$ is equal to number of outputs (k) and number of parameters in $b_h$ is equal to number of hidden layers (n). Hence the final value is:

$n^2 + n + mn + kn + k$