0

I found the following information in the link Why is log-of-sum-of-exponentials $f(x)=\log\left(\sum_{i=1}^n e^ {x_i}\right)$ a convex function for $x \in\mathbb R^n$?

A nice fact about the logSumExp function $f$ is that its gradient is the softmax function $S$: $$ \nabla f(x) = S(x) = \begin{bmatrix} \frac{e^{x_1}}{e^{x_1} + \cdots + e^{x_n}} \\ \vdots \\ \frac{e^{x_n}}{e^{x_1} + \cdots +e^{x_n}} \end{bmatrix}. $$ The Hessian of $f$ is the matrix $S'(x)$, and a nice fact about the softmax function is that $$ S'(x) = \text{diag}(S(x)) - S(x) S(x)^T. $$

But I don't understand why $D^{2}f = S^{\prime}$ if for instance when $n=2$, I have

$$D^{2}f(x) = \left(\begin{array}{cc} \frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} & -\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} \\ -\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} & \frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} \end{array}\right)$$

But $diag (S)$ is a matrix $2\times 2$, which does have the term $\frac{e^{x_{1}}e^{x_{2}}}{\left(e^{x_{1}}+ e^{x_{2}}\right)^{2}} $, while $- S(x) S(x)^T$ for me is a real number. so I don't understand how prove this afirmation in this case $n=2$.

0 Answers0