Chapter 10 of the Deep Learning book has
$$ \begin{align} a^{(t)} &= b + Wh^{(t-1)} + Ux^{(t)} \\ h^{(t)} &= \tanh(a^{(t)})\\ o^{(t)} &= c + Vh^{(t)}\\ \hat{y}^{(t)} &= \text{softmax}(o^{(t)})\\ \\ L &= \sum_t L^{(t)}\\ &= -\sum_t \log{p_{\text{model}}(y^{(t)}\ |\ x^{(1)},\dots,x^{(t)})} \end{align} $$ where $p_{\text{model}}(y^{(t)}\ |\ x^{(1)},\dots,x^{(t)})$ is given by reading the entry for $y^{(t)}$ from the model's output vector $\hat{y}^{(t)}$.
...
$$ \frac{\partial L}{\partial L^{(t)}}=1\\ (\nabla_{\pmb{o}^{(t)}}L)_i = \frac{\partial L}{\partial L^{(t)}}\frac{\partial L^{(t)}}{\partial o_i^{(t)}} = \hat{y}_i^{(t)} - \pmb{1}_{i=y^{(t)}} $$
I got $\frac{\partial L^{(t)}}{\partial o_i^{(t)}} = \hat{y}_i^{(t)} - y_i^{(t)}$ as shown here. But how do we get the result in the book?