Let us consider the following functions
\begin{equation} y = \operatorname{softmax}(z) \end{equation} \begin{equation} z = h\cdot W + b \end{equation}
where $y, h, W$ and $b$ are $1 \times n$, $1 \times m$, $m \times n$ and $1 \times n$ matrices. Compute $\frac{\partial{y_i}}{\partial{W}}$.
My efforts:
\begin{equation} \frac{\partial{y_i}}{\partial{W}} = \frac{\partial{y_i}}{\partial{z}} \times \frac{\partial{z}}{\partial{W}} \end{equation}
Here $z$ is a vector and $W$ is a matrix so $\frac{\partial{z}}{\partial{W}}$ will be a 3D tensor.
But $y_i$ is a scalar and $W$ is $m \times n$ matrix so $\frac{\partial{y_i}}{\partial{W}}$ should be of size $m \times n$.
Please tell me where I am wrong?