4

I do know how to calculate the derivative of sigmoid function assuming the input is a scalar. How to properly derive the derivative of sigmoid function assuming the input is a matrix - i.e. using matrix calculus? The fraction (a sort of division) looks weird in there.

Here are my vague ideas (inspired on how it is coded): $$ \sigma(\mathbf{X}) = \frac{1}{1 + \exp(-\mathbf{X})} $$ \begin{split} \mathrm{d} \sigma(\mathbf{X}) & = \frac{-1 \left[ \exp(-\mathbf{X}) \odot \mathrm{d} (-\mathbf{X}) \right]}{\left( 1 + \exp(-\mathbf{X}) \right)^2} =\\ & = \frac{-1 \left[ \exp(-\mathbf{X}) \odot (-\mathbf{1}) \odot \mathrm{d} \mathbf{X} \right]}{\left( 1 + \exp(-\mathbf{X}) \right)^2} = \\ & = \frac{ \mathbf{1} \odot \exp(-\mathbf{X}) \odot \mathrm{d} \mathbf{X} }{\left( 1 + \exp(-\mathbf{X}) \right)^2} = \\ & = \frac{\mathbf{1}}{1 + \exp(-\mathbf{X})} \odot \frac{\exp(-\mathbf{X}) + \mathbf{1} - \mathbf{1} }{1 + \exp(-\mathbf{X})} \odot \mathrm{d} \mathbf{X} =\\ & = \sigma(\mathbf{X}) \odot (\mathbf{1} - \sigma(\mathbf{X})) \odot \mathrm{d} \mathbf{X} \end{split}

The result matches how I would code it yet the derivation just does not seem right.

Szpilona
  • 151
  • $\sigma(X)$ will transform each element $X_{i,j}$ to $\sigma(X_{i,j})$. So taking the derivative with respect to $X$ means taking the derivative of each $\sigma(X_{i,j})$ with respect to $X_{i,j}$ where $X_{i,j}$ is a scalar. Your derivation seems to me to be correct. – Dhruv Kohli Jun 15 '17 at 07:50

1 Answers1

3

I think things can be made a little simpler than your (correct) derivation. Let $\sigma:\mathbb{R}^{m\times n}\rightarrow\mathbb{R}^{m\times n}$ apply the sigmoid function to each element. Then, just follow the chain rule for matrix calculus: $$ \frac{d}{dX}\sigma(X) = \frac{\partial \sigma}{\partial X}\frac{\partial X}{\partial X} = \frac{\partial \sigma}{\partial X} = \sigma(X)\odot[\textbf{1} - \sigma(X)] $$ where $\odot$ is the Hadamard product and $[\textbf{1}]_{ij}=1\;\forall\;i,j$. We can make the final step with the observation that: $$ \left[\frac{\partial\sigma}{\partial X}\right]_{ij} = \frac{\partial}{x_{ij}}\sigma(x_{ij}) = \sigma(x_{ij})[1 - \sigma(x_{ij})] $$ which follows from the scalar definition of the sigmoid.

user3658307
  • 10,433