What does the residual error mean when we are talking about LSTM?
Taken from the middle of section 3 of this paper, where it says:
"...of the residual error $\epsilon$"
Where $s_0$ is the initial state of the RNN network.
Question: how is a residual error different to a usual error? Why to use such a term?