Define a sequence $(\mathbf{y})_{i=0}^N$ in $\mathbb{R}^n$ such that: $$\mathbf{y}_{k+1} = \mathbf{y}_{k} + \lambda \nabla_\mathbf{y} E(\mathbf{y}_k,\mathbf{w}), \quad k=0,1,\ldots,N-1,$$ where $\lambda$ is a constant, $\mathbf{w}\in\mathbb{R}^m$, and $E:\mathbb{R}^{n+m}\to \mathbb{R}$ is some differentiable function.
Let $Q:\mathbb{R}^{n}\to \mathbb{R}$ be a differentiable function and $L=Q(\mathbf{y} _N)$.
Applying the chain rule we have: $$\frac{dL}{d\mathbf{w}} = \sum_{k=1}^N\frac{\partial \mathbf{y}_k^\top}{\partial \mathbf{w}} \frac{dQ}{d\mathbf{y}_k}\qquad (1)$$ and $$\frac{dQ}{d\mathbf{y}_k} = \frac{\partial \mathbf{y}_{k+1}^\top}{\partial \mathbf{y}_{k}} \frac{dQ}{d\mathbf{y}_{k+1}}.\qquad (2)$$
(Source: this paper, equation (12))
My questions:
- How to obtain $(1)$? Shouldn't it be $$\frac{dL}{d\mathbf{w}} = \sum_{k=1}^N \frac{d\mathbf{y}_k^\top}{d\mathbf{w}} \frac{\partial Q}{\partial\mathbf{y}_k}?$$ The operators $\partial$ and $d$ seem to be reversed.
- I'm again confused about using $\partial$ and $d$ in $(2)$. Could you please explain it?
Thank you very much in advance for your help!