Basic question on Bregman divergences and strong convexity

Question

I have seen the following claim in several research papers:

Let $p \in (1,2]$, and consider the function $\Phi(x)=\frac{1}{2}\|x\|^2_p$ (where $x\in\mathbb{R}^d$). Then the corresponding Bregman divergence $D_\Phi$ is $(p-1)$-strongly convex w.r.t the $\ell_p$ norm: $$D_\Phi(y,x) \geq \frac{p-1}{2} \|x-y\|^2_p$$ for all $x,y\in\mathbb{R}^d$.

Typically, no other detail is given, besides "cf. Ball et al. (1994)." I cannot see how that follows immediately from that paper, however. Is it obvious?

Note: related to (but different from) https://math.stackexchange.com/questions/4034958/strong-convexity-of-l-p-induced-bregman-divergence — Clement C., Feb 24 '21 at 20:56
If I'm not mistaken, Appendix 1 in this paper contains a proof on the strong convexity of $\frac{1}{2} | x |_p^2$, from which the inequality follows. — VHarisop, Mar 01 '21 at 05:46
@VHarisop That's useful -- thank you! But I'm interested mainly in how one can derive this from the paper of Ball, Carlen, and Lieb (1994), since that is the one I have seen credited for that in several places. — Clement C., Mar 01 '21 at 05:54

score 4 · Accepted Answer · answered Mar 02 '21 at 07:17

I believe this is just the fact that for continuous convex functions midpoint strong convexity is equivalent to strong convexity.

For instance, suppose we define a function $g \colon \mathbf{R}^n \to \mathbf{R}$to be $\lambda$-midpoint strongly convex w.r.t. $\|\cdot\|$ provided that: $$ (\star) \qquad g\Big(\frac{u + v}{2}\Big) \leq \frac{g(u) + g(v)}{2} - \frac{\lambda}{8} \|u - v\|^2, \quad \mbox{for all}~u, v \in \mathbf{R}^n.$$ This definition is clearly equivalent to: $$ \frac{g(x + y) + g(x - y)}{2} \geq g(x) + \frac{\lambda}{2} \|y\|^2, \quad \mbox{for all}~x, y \in \mathbf{R}^n. $$ Above we simply use the homogeneity of the norm. Proposition 3 in this jargon says that the function $f_p \colon u \mapsto \tfrac{1}{2} \|u\|_p^2$ is $(p-1)$-midpoint strongly convex w.r.t. $\|\cdot\|_p$.

Therefore, to establish your claim, it suffices to verify that $(\star)$ (i.e., midpoint strong convexity) is equivalent to strong convexity.

The equivalence can be verified by modifying the proof in https://math.stackexchange.com/questions/1002248/if-f-is-continuous-and-f-big-frac12xy-big-le-frac12-big-fx in the obvious way. It is quite tedious to write down, however. — Drew Brady, Mar 02 '21 at 07:37
I see. I didn't think to go that way (prove the thing on $\Phi$, not directly on the divergence), it seems obvious in retrospect. Thank you! — Clement C., Mar 02 '21 at 07:44
Ah no problem! It is probably worth leaving the comment here that the claim on the divergence is in fact equivalent to noting that $\tfrac{1}{2} |\cdot|_p^2$ is $(p-1)$-strongly convex w.r.t. $|\cdot|_p$. — Drew Brady, Mar 02 '21 at 07:51

score 1 · Answer 2 · edited Mar 02 '21 at 08:43

Some progress (?).

For any $\delta\in\mathbb{R}^d$, considering the average $\frac{\Phi(x+\delta)+\Phi(x-\delta)}{2}$, we get that the terms $\langle \nabla\Phi(x), x\pm\delta-x\rangle$ cancel, leading to $$ \frac{D_\Phi(x+\delta,x)+D_\Phi(x-\delta,x)}{2} = \frac{1}{2}\left(\frac{\|x+\delta\|_p^2+\|x-\delta\|_p^2}{2}-\|x\|_p^2\right) $$ and, using the Ball—Carlen—Lieb inequality (Proposition 3 in the paper, Eq. (2.18)) gives $$ \frac{D_\Phi(x+\delta,x)+D_\Phi(x-\delta,x)}{2} \geq \frac{p-1}{2}\|\delta\|_p^2 \tag{1} $$

Setting $\delta := x-y$ in (1), we get $$ \frac{D_\Phi(2x-y,x)+D_\Phi(y,x)}{2} \geq \frac{p-1}{2}\|x-y\|_p^2 \tag{2} $$ which seems to be related to, but weaker than what we want.

Basic question on Bregman divergences and strong convexity

2 Answers2