1

Let $\ell_H: \mathbb R \rightarrow \mathbb R_{\infty}, x\mapsto \max\{0, 1-x\}$, which is the so-called "hinge loss function" (which is important in the context of machine learning and SVM's, but that's not what I want to talk about here).

In an exercise, we're supposed to calculate the so-called "Fenchel conjugate" of this function, which was defined in our lecture notes as follows:

Definition: Let $J: X\rightarrow\mathbb R_{\infty}$ be proper. The convex conjugate (or Fenchel conjugate) $J^{\star}$ is defined as $$J^{\star}(p) = \sup_{x\in X}\left(\left\langle p | x\right\rangle - J(x)\right).$$

One thing missing in the Def. is that $J^{\star}: X^{\star} \rightarrow \mathbb R_{\infty}$, so $J^{\star}$ "operates" on the dual space of $X$.. The solution is supposed to be $$\ell_{H}^{\star}(p) = \begin{cases} p, \quad p\in \left[-1, 0\right], \\ \infty, \ \ \text{otherwise} \end{cases},$$ but I don't obtain this (yet). Here is my approach:

$$\ell^{\star}_{H}(p) = \sup_{x\in \mathbb R}\{ \langle p | x \rangle- \ell_{H}(x) \} = \sup_{x\in \mathbb R}\{ \langle p | x \rangle- \max\{0, 1-x\}\}.$$ To the best of my understanding, $\langle p| x \rangle = p(x)$, since $p\in \mathbb R^{\star}$, and since we can write every element $p$ of $\mathbb R^{\star}$ in the form $p(x) = mx$, where $m\in \mathbb{R}$, we get:

$$\ell^{\star}_{H}(p) = \sup_{x\in \mathbb R}\{ \langle p | x \rangle- \ell_{H}(x) \} = \sup_{x\in \mathbb R}\{ mx - \max\{0, 1-x\}\}.$$

For $x\geq 1$, we have $\max\{0, 1-x\} \geq 0$, and thus $mx - \max\{0, 1-x\} = mx - (1 - x) = (m+1)x - 1$. Thus, for $x\geq 1$, the $\sup$ yields $\infty$, and for $x < 0$, it should yield the same (because then we only have $mx - 0 = mx$).

Apparently, my thinking is wrong, which leads me to the following questions:

(1) What does it (in the solution from above) mean that $p\in [-1, 0]$? After all, $p$ is an element of the dual space $\mathbb R^{\star}$. Maybe it means that $p(x) \in [-1, 0]$?

(2) Why does the Fenchel conjugate yield $p$ for $p\in [-1, 0]$?

Hermi
  • 692
  • You're correct that it doesn't make sense to say that $p \in [-1,0]$, because $p$ is a linear functional rather than a number. However, there is a number $\hat p$ such that $p(x) = \hat p x$, and it is common to "not distinguish" in our notation between $p$ and $\hat p$. So really they should have said $\hat p \in [-1,0]$. I think whoever wrote the solution was not being very careful. – littleO Jun 16 '21 at 18:42
  • You might be interested in reading the intuitive explanation of the convex conjugate that I described here:https://math.stackexchange.com/a/624633/40119 – littleO Jun 16 '21 at 18:48
  • @littleO Distinguishing strictly between the dual space of linear functionals and the identified space is just distracting here. Just identify the dual with the real numbers and move on. And yes, identifying both is rigorous and not doing so is unnecessary nitpicking. – Dirk Jun 16 '21 at 18:48
  • It's a matter of opinion, I suppose, but it has been a source of confusion for me in the past and it seems to have been a source of confusion in this case also. But, instead of saying the solution writer was not being careful, I probably should have said they just made an aesthetic choice not to distinguish between those two quantities in their notation, which is a choice often made by math authors, including the best math authors. – littleO Jun 16 '21 at 18:56
  • Tangential comment: when I first read that an author was going to "not distinguish between" distinct mathematical objects, I was confused because I didn't know if they meant "not distinguish between them conceptually" or just "not distinguish between them in our notation". Using the same symbol to denote distinct but related objects makes sense, so I can understand not distinguishing in notation. But I was confused by the idea of "identifying" distinct objects conceptually rather than just in notation -- because in my mind I know they are not the same object, and I can't stop knowing that. – littleO Jun 16 '21 at 19:25

1 Answers1

0

Your argument starts right, but then you make a wrong turn. You should start with some fixed $m$ and then check where the function $h(x) = mx - \max(0,1-x)$ has its supremum and when you check, you consider the cases $1-x>0$ and $<0$. If $m>0$, then the function $h$ is unbounded from above (for $x\to\infty$). if $m\leq 0$, then it depends on how negative $m$ is: For $m<-1$ the function $h$ is again unbounded from above (this time for $x\to-\infty$). Finally, check the case $-1\leq m\leq 0$ and note that $h(x)\leq 0$ with $h(0)=0$. All this together shows that the conjugate looks as claimed.

Dirk
  • 11,680