The context of my question has an applications for machine learning, particularly for SVMs, but that's not what I want to talk about here. Instead, I would like to focus on the mathematics.
So: Let $\ell_H: \mathbb R\rightarrow \mathbb R_{\infty}$ be the hinge loss $\ell_H(x) = \max\{0, 1-x\}$. Let $J:\mathbb R^m\rightarrow \mathbb R_{\infty}$ be the function (called "loss function" in machine learning) $J(z) = \sum_{i = 1}^{m}\ell_H(z_i)$. In an exercise, we are supposed to derive the dual problem for this loss function $J$. From our lecture notes:
Let $J:X\rightarrow\mathbb R_{\infty}$ be of the form $J(x) = F(x) + G(Ax)$ with convex, lower semicontinuous, proper maps $F:X\rightarrow\mathbb R_{\infty}$ and $G:Y\rightarrow \mathbb R_{\infty}$ and linear bounded operator $A:X\rightarrow Y$. We introduce the perturbation $\Phi: X\times Y\rightarrow \mathbb R_{\infty}$, $\Phi(x, p) = F(x) + G(Ax-p)$ [...].
Definition. The primal problem is defined as $$\inf_{x\in X} \Phi(x, 0) = \inf_{x\in X}F(x) + G(Ax), \qquad (\mathcal P)$$ and the corresponding dual problem with respect to the perturbation $\Phi$ is defined by $$\sup_{p^{\star}\in Y^{\star}}\left\{ -\Phi^{\star}(0, p^{\star})\right\}. \qquad\qquad\qquad\qquad (\mathcal D)$$
Some remarks:
- $\Phi^{\star}$ refers to the Fenchel conjugate, I wrote down the definition in this post. $p^{\star}$ is an element of the dual space of $Y$, i.e. $p^{\star}\in Y^{\star}$.
- In the lecture, we also showed that $\Phi^{\star}(0, p^{\star}) = F^{\star}(A^{\star}p^{\star}) + G^{\star}(-p^{\star})$, where $F^{\star}$ and $G^{\star}$ refer to the Fenchel conjugates again.Thus, we can write the dual problem as $$\sup_{p^{\star}\in Y^{\star}}\left\{ -\Phi^{\star}(0, p^{\star})\right\} = \sup_{p^{\star}\in Y^{\star}}\left\{ -F^{\star}(A^{\star}p^{\star})-G^{\star}(-p^{\star})\right\}.$$.
I would have the following two questions:
- I am not sure how to write the given function $J(z) = \sum_{i = 1}^{m}\ell_H(z_i) = \sum_{i = 1}^{m}\max\{0, 1-z_i\}$ in the form $J(x) = F(x) + G(Ax)$.
- In a first step for the exercise, we were also supposed to compute the subdifferential of the function $g(x) = \max\{0, x\}$. Why is this necessary? (Note that this is not a typo, so $g(x) \ne \ell_H(x)$.)