6

Let us consider the following functions

\begin{equation} y = \operatorname{softmax}(z) \end{equation} \begin{equation} z = h\cdot W + b \end{equation}

where $y, h, W$ and $b$ are $1 \times n$, $1 \times m$, $m \times n$ and $1 \times n$ matrices. Compute $\frac{\partial{y_i}}{\partial{W}}$.

My efforts:

\begin{equation} \frac{\partial{y_i}}{\partial{W}} = \frac{\partial{y_i}}{\partial{z}} \times \frac{\partial{z}}{\partial{W}} \end{equation}

Here $z$ is a vector and $W$ is a matrix so $\frac{\partial{z}}{\partial{W}}$ will be a 3D tensor.

But $y_i$ is a scalar and $W$ is $m \times n$ matrix so $\frac{\partial{y_i}}{\partial{W}}$ should be of size $m \times n$.

Please tell me where I am wrong?

tourism
  • 137

1 Answers1

4

Given $$\eqalign{ z &= hW+b \cr y &= \operatorname{softmax}(z) \cr Y &= \operatorname{Diag}(y) \cr }$$ Find the differential and gradient of $y$ $$\eqalign{ dy &= dz\,(Y-y^Ty) \cr &= h\,dW\,(Y-y^Ty) \cr &= h\,{\mathbb E}\,(Y-y^Ty):dW \cr\cr \frac{\partial y}{\partial W} &= h\,{\mathbb E}\,(Y-y^Ty) \cr }$$ where colon denotes the double-dot (aka Frobenius) product, and ${\mathbb E}$ is a $4^{th}$ order isotropic tensor with components $${\mathbb E}_{ijkl} = \delta_{ik}\,\delta_{jl}$$

Also recall that we are working with row vectors, so $(y^Ty)$ is a matrix, not a scalar product.

$$\eqalign{}$$

$$\eqalign{}$$

hans
  • 1,724
  • 8
  • 4
  • 3
    Using index notation @hans answer becomes $$\eqalign{\frac{\partial y_j}{\partial W_{km}}&=h_i{\mathbb E}{ijkl}(Y{lm}-y_ly_m)\cr&=h_i(\delta_{ik}\delta_{jl})(Y_{lm}-y_ly_m)\cr&=h_k(Y_{jm}-y_jy_m)}$$ – greg Dec 28 '16 at 19:41
  • Thanks !! ... It helped a lot... – tourism Jan 04 '17 at 01:39