Dual Problem of the Hinge Loss Function

Question

The context of my question has an applications for machine learning, particularly for SVMs, but that's not what I want to talk about here. Instead, I would like to focus on the mathematics.

So: Let $\ell_H: \mathbb R\rightarrow \mathbb R_{\infty}$ be the hinge loss $\ell_H(x) = \max\{0, 1-x\}$. Let $J:\mathbb R^m\rightarrow \mathbb R_{\infty}$ be the function (called "loss function" in machine learning) $J(z) = \sum_{i = 1}^{m}\ell_H(z_i)$. In an exercise, we are supposed to derive the dual problem for this loss function $J$. From our lecture notes:

Let $J:X\rightarrow\mathbb R_{\infty}$ be of the form $J(x) = F(x) + G(Ax)$ with convex, lower semicontinuous, proper maps $F:X\rightarrow\mathbb R_{\infty}$ and $G:Y\rightarrow \mathbb R_{\infty}$ and linear bounded operator $A:X\rightarrow Y$. We introduce the perturbation $\Phi: X\times Y\rightarrow \mathbb R_{\infty}$, $\Phi(x, p) = F(x) + G(Ax-p)$ [...].

Definition. The primal problem is defined as $$\inf_{x\in X} \Phi(x, 0) = \inf_{x\in X}F(x) + G(Ax), \qquad (\mathcal P)$$ and the corresponding dual problem with respect to the perturbation $\Phi$ is defined by $$\sup_{p^{\star}\in Y^{\star}}\left\{ -\Phi^{\star}(0, p^{\star})\right\}. \qquad\qquad\qquad\qquad (\mathcal D)$$

Some remarks:

$\Phi^{\star}$ refers to the Fenchel conjugate, I wrote down the definition in this post. $p^{\star}$ is an element of the dual space of $Y$, i.e. $p^{\star}\in Y^{\star}$.
In the lecture, we also showed that $\Phi^{\star}(0, p^{\star}) = F^{\star}(A^{\star}p^{\star}) + G^{\star}(-p^{\star})$, where $F^{\star}$ and $G^{\star}$ refer to the Fenchel conjugates again.Thus, we can write the dual problem as $$\sup_{p^{\star}\in Y^{\star}}\left\{ -\Phi^{\star}(0, p^{\star})\right\} = \sup_{p^{\star}\in Y^{\star}}\left\{ -F^{\star}(A^{\star}p^{\star})-G^{\star}(-p^{\star})\right\}.$$.

I would have the following two questions:

I am not sure how to write the given function $J(z) = \sum_{i = 1}^{m}\ell_H(z_i) = \sum_{i = 1}^{m}\max\{0, 1-z_i\}$ in the form $J(x) = F(x) + G(Ax)$.
In a first step for the exercise, we were also supposed to compute the subdifferential of the function $g(x) = \max\{0, x\}$. Why is this necessary? (Note that this is not a typo, so $g(x) \ne \ell_H(x)$.)

score 1 · Answer 1 · answered Jun 20 '21 at 10:10

For the first part, if $A=-I$ and $G: \mathbf{R}^m\rightarrow \mathbf{R}^m$ is the function $\max(0, 1 + x)$ (where $0$ and $1$ are the vectors of all zeros/ones respectively) that should fit the form you've provided ($F(x)=0$ in this case).

For the second part, the Fenchel conjugate requires taking a supremum which can often be done analytically by taking the gradient and setting it to 0. I'd guess the subgradient will be needed for something like this.

Dual Problem of the Hinge Loss Function

1 Answers1