23

How to prove $f(x)=\log\left(\displaystyle\sum_{i=1}^n e^{x_i}\right)$ is a convex function?

EDIT1: for above function $f(x)$, following inequalities hold:

$$\max\{x_1,x_2,\ldots,x_n\}\leqslant f(x)\leqslant\max\{x_1,x_2,\ldots,x_n\}+\log n$$

and I have tried proving its convexity via definition of a convex function with above inequalities, but that didn't work.

EDIT2: I have posted my answers below.

Finley
  • 1,133
  • Composition rules for preserving convexity. But it didn't work – Finley Sep 06 '17 at 04:18
  • What about using induction? – Sergio Enrique Yarza Acuña Sep 06 '17 at 04:53
  • A possible duplicate of https://math.stackexchange.com/questions/2416837/the-second-derivative-of-log-left-sum-limits-i-1nex-i-right-seems-neg – Math Lover Sep 06 '17 at 04:58
  • @MathLover : It seems that I need to compute the Hessian of $f(x)$ that I tried to avoid before. – Finley Sep 06 '17 at 05:10
  • @Finley At this moment, I can't think of any other way to prove that the function is convex. The individual entries of the Hessian matrix are given in https://math.stackexchange.com/questions/2416837/the-second-derivative-of-log-left-sum-limits-i-1nex-i-right-seems-neg. A proof is also given based on the C-S inequality. – Math Lover Sep 06 '17 at 05:13
  • 1
    @Finley The Hessian matrix is also given in https://web.stanford.edu/class/ee364a/lectures/functions.pdf (check the 10th slide). A proof (based on the C-S inequality) is also given there. – Math Lover Sep 06 '17 at 05:19
  • @MathLover : Thanks for your patience! I get it. – Finley Sep 06 '17 at 05:25
  • The convexity of the log-sum-exp function can also be proved using the gradient inequality and the Hessian criterion (the Hessian is PSD). All you need is the Cauchy-Schwartz inequality for both. – Alex Shtoff Sep 06 '17 at 08:36

6 Answers6

26

Proof:

Let $u_i=e^ {x_i} ,v_i=e^ {y_i}$. So $f(\theta x+(1-\theta)y)=\log(\sum_ {i=1}^n e^{\theta x_i + (1-\theta)y_i})=\log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)})$

From Hölder's inequality:

$$\sum_ {i=1}^n x_iy_i \le (\sum_ {i=1}^n|x_i|^p)^{\frac{1}{p}} \cdot (\sum_ {i=1}^n|y_i|^q)^{\frac{1}{q}}$$ where $1/p+1/q=1$.

Applying this inequality to $f(\theta x+(1-\theta)y)$: $$\log(\sum_ {i=1}^n u_i^ \theta v_i^{(1-\theta)}) \le \log[(\sum_ {i=1}^n u_i^ {\theta \cdot \frac{1}{\theta}})^ \theta \cdot (\sum_ {i=1}^n v_i^ {1-\theta \cdot \frac{1}{1-\theta}})^ {1-\theta}]$$ The right formula can be reduced to:

$$\theta \log\left(\sum_ {i=1}^n u_i\right)+(1-\theta)\log \left(\sum_ {i=1}^n v_i \right)$$

Here I regard $\theta$ as $\frac{1}{p}$ and $1-\theta$ as $\frac{1}{q}$.

So I achieve that $f(\theta x+(1-\theta)y) \le \theta f(x) + (1-\theta)f(y)$.

AspiringMat
  • 2,483
  • 1
  • 17
  • 32
Finley
  • 1,133
16

It is enough to show that $$\frac{1}{2} \log (\sum \exp x_i) + \frac{1}{2}\log (\sum \exp y_i)\ge \log (\sum \exp\frac{x_i+y_i}{2})$$ or, with the substitution $\exp\frac{x_i}{2} = a_i$, $\exp\frac{y_i}{2} = b_i$ $$(\sum a_i^2)^{\frac{1}{2}}(\sum b_i^2)^{\frac{1}{2}}\ge \sum a_i b_i$$

orangeskid
  • 53,909
  • This is much nicer than the other answers! – user1551 Apr 12 '20 at 04:57
  • @user1551: Thanks! I heard about it online, also that it's somehow hard to get... turns out it's just an old friend of ours... – orangeskid Apr 12 '20 at 05:03
  • 2
    It looks like you applied the definition of convexity? $f(\lambda x + (1-\lambda) y) \leq \lambda f(x) + (1 - \lambda)f(y)$. But what is the $\frac{1}{2}$ in this case? It seems that it's $\lambda$? If so, why do you fix $\lambda$ instead of leaving it generic? – David Jul 04 '20 at 15:46
  • 1
    @David: once it works for $\lambda=\frac{1}{2}$, one can show that it works for any $\lambda\in [0,1]$, using the continuity of $f$. It is a standard fact for convex functions. The case of general $\lambda$ is equivalent to Holder's inequality, so that would be an alternate approach. – orangeskid Jul 04 '20 at 18:24
  • 1
    @orangeskid Any chance you can provide a citation for this $\lambda=1/2$ trick? Seems like it could be useful to prove convexity in many cases. – a06e Apr 23 '23 at 14:26
  • @a06e:maybe Inequalities by Hardy et al. ? – orangeskid Apr 23 '23 at 17:05
6

Another way to prove the convexity of this function is to use the Jensen's Inequality which states that a function is convex if and only if

$$f(tX+(1-t)Y) \le t f(X) + (1-t)f(Y)$$

Now let $X$ be represented by the vector $({X_1, X_2, X_3,... X_n})$,

and let $Y$ be represented by the vector $({Y_1, Y_2, Y_3,... Y_n})$

Let $t = \dfrac{1}{2}$

$$f(tX+(1-t)Y) = \log\left(\sum_{i=1}^{n} e^{\frac{X_i+Y_i}{2}}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\right)+ \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{Y_i}\right)$$

$$\text{RHS} = \frac{1}{2} \log\left(\sum_{i = 1}^{n} e^{X_i}\sum_{i = 1}^{n} e^{Y_i}\right)$$

RHS contains more cross product terms than the LHS thus making it larger than LHS and hence the function is convex.

  • How to prove Jensen's inequality holds when $0 \le t \le 1$ more than $t=1/2$? – Finley Sep 06 '17 at 06:10
  • According to definition of function convexity. $f(x)$ is convex if and only if Jensen's inequality holds for any $t \in [0,1]$ – Finley Sep 06 '17 at 06:13
  • what t=(0,1) suggests is that you are taking a point in between X and Y vector. For any value of t between two point would range from 0-1. It not only applies for t= .5 but also any t within (0,1) which is essentially what you want – Satish Ramanathan Sep 06 '17 at 06:26
  • (+1) It is also a consequence of CS-inequality: $$ \sum_{i=1}^{n} e^{X_i/2}e^{Y_i/2} \leq \left( \sum_{i=1}^{n} e^{X_i} \right)^{1/2}\left( \sum_{i=1}^{n} e^{Y_i} \right)^{1/2} $$ This suggests that for general $t \in [0, 1]$ the same proof works by using Hölder's inequality instead. – Sangchul Lee Sep 06 '17 at 06:37
  • I don't follow. Why is $\log(\sum_{i=1}^{n}e^{\frac{X_i+Y_i}2})=\frac12\log(\sum_{i=1}^{n}e^{X_i})+\log(\sum_{i=1}^ne^{Y_i})$? – user1551 Sep 06 '17 at 06:49
  • it is not equal but less than or equal to (I have evaluated RHS and LHS separately – Satish Ramanathan Sep 06 '17 at 06:50
  • Oh, I see. You are trying to reduce the problem statement. OK, thanks. – user1551 Sep 06 '17 at 06:54
  • You have only shown midpoint convexity – Sridhar Thiagarajan Nov 16 '18 at 05:44
6

This answer is similar to the answer written by @Nicholas, but I'm including more details.

A nice fact about the logSumExp function $f$ is that its gradient is the softmax function $S$: $$ \nabla f(x) = S(x) = \begin{bmatrix} \frac{e^{x_1}}{e^{x_1} + \cdots + e^{x_n}} \\ \vdots \\ \frac{e^{x_n}}{e^{x_1} + \cdots +e^{x_n}} \end{bmatrix}. $$ The Hessian of $f$ is the matrix $S'(x)$, and a nice fact about the softmax function is that $$ S'(x) = \text{diag}(S(x)) - S(x) S(x)^T. $$ If we can show that $S'(x)$ is positive semidefinite, it will follow that $f$ is convex.

Edit: At this point, I recommend reading @Bruno-84’s proof, which is superior to the argument that I gave below.

Original argument:

In other words, we need to show that if $v \in \mathbb R^n$, then $v^T S'(x) v \geq 0$. But notice that \begin{align} & v^T S'(x) v \geq 0 \\ \iff & v^T \text{diag}(S(x)) v \geq v^T S(x) S(x)^T v \\ \iff & \sum_{i=1}^n \left( \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}\right) v_i^2 \geq (S(x)^T v )^2 \\ \iff & \sum_{i=1}^n \left( \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}}\right) v_i^2 \geq \left( \sum_{i=1}^n v_i \cdot \frac{e^{x_i}}{e^{x_1} + \cdots + e^{x_n}} \right)^2 \\ \iff & \left(\sum_{i=1}^n e^{x_i} v_i^2 \right) \left(\sum_{i=1}^n e^{x_i}\right) \geq \left(\sum_{i=1}^n v_i e^{x_i} \right)^2 \end{align} This last inequality is true, as can be seen by applying the Cauchy-Schwarz inequality to the vectors $$ a = \begin{bmatrix} \sqrt{e^{x_1}} \\ \vdots \\ \sqrt{e^{x_n}} \end{bmatrix}, \quad b = \begin{bmatrix} v_1 \sqrt{e^{x_1}} \\ \vdots \\ v_n \sqrt{e^{x_n}} \end{bmatrix}. $$

littleO
  • 51,938
2

For a multivariate function to be convex, it's equivalent to show that its Hessian matrix is positive semi-definite. That is, you can calculate $\nabla^2 f(\mathbf{x})$ here and show it is positive semi-definite.

This can be proved using Cauchy Schwarz inequality as shown here.

Nicholas
  • 363
  • 1
    If anyone doesn't understand how CS-inequality is applied at the end of the linked slide, Boyd's book section 3.1.5 has a proof on this as well. – Mong H. Ng Sep 08 '19 at 23:48
  • I also just posted an answer here that gives more details about this proof, including how the Cauchy-Schwarz inequality is applied in this case. – littleO Apr 12 '20 at 05:04
2

I have a preference for proving that the Hessian matrix is non negative as fully developed by @littleO. Once you have shown that the gradient is the softmax function $S(x)$ and the Hessian matrix has the expression $$ \nabla^2 f(x) = \operatorname{diag}(S(x)) - S(x)S(x)^T, $$ you can simply use the convexity of the real map $t\mapsto t^2$ to conclude that $\nabla^2 f(x)$ is non negative: Since $S(x)^T v$ is a convex linear combination of the coordinates of the vector $v$ (that is $S(x)_k\geq 0$ and $\sum_{k=1}^n S(x)_k =1$), one has $$ v^T S(x) S(x)^T v = (S(x)^T v)^2 = \left(\sum_{k=1}^n S(x)_k v_k\right)^2 \leq \sum_{k=1}^n S(x)_k v_k^2 = v^T \operatorname{diag}(S(x)) v. $$