Help to understand the affirmation about the softmax function.

Asked Aug 02 '22 at 03:10

Active Aug 02 '22 at 13:05

Viewed 48 times

I found the following information in the link Why is log-of-sum-of-exponentials $f(x)=\log\left(\sum_{i=1}^n e^ {x_i}\right)$ a convex function for $x \in\mathbb R^n$?

A nice fact about the logSumExp function $f$ is that its gradient is the softmax function $S$: $$ \nabla f(x) = S(x) = \begin{bmatrix} \frac{e^{x_1}}{e^{x_1} + \cdots + e^{x_n}} \\ \vdots \\ \frac{e^{x_n}}{e^{x_1} + \cdots +e^{x_n}} \end{bmatrix}. $$ The Hessian of $f$ is the matrix $S'(x)$, and a nice fact about the softmax function is that $$ S'(x) = \text{diag}(S(x)) - S(x) S(x)^T. $$

But I don't understand why $D^{2}f = S^{\prime}$ if for instance when $n=2$, I have

$$D^{2}f(x) = \left(\begin{array}{cc} \frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} & -\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} \\ -\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} & \frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} \end{array}\right)$$

But $diag (S)$ is a matrix $2\times 2$, which does have the term $\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} $, while $- S(x) S(x)^T$ for me is a real number. so I don't understand how prove this afirmation in this case $n=2$.

edited Aug 02 '22 at 13:05

asked Aug 02 '22 at 03:10

Zacarias89.

1

Note that $S(x)$ is a column vector, so $S(x)^T$ is a row vector and $S(x) S(x)^T$ is a square matrix. – littleO Aug 02 '22 at 03:37
1

Presumably you meant $D^2f(x)$, not $Df$? – copper.hat Aug 02 '22 at 03:38
Thanks! I understand @littleO. – Zacarias89. Aug 02 '22 at 13:04
Thanks! @cooper.hat I already corrected. – Zacarias89. Aug 02 '22 at 13:05

Help to understand the affirmation about the softmax function.

0 Answers0