Understanding Subgradient?

Question

I have seen in some literature that the derivative of the $l_1$-norm is represented by $sgn (.)$ function. I know , in general , the $l_1$-norm is not differentiable and therefore talking about gradient doesn't make sense. In fact, what we are implying is sub-gradient. I read allitle about the sub-gradient, but still confused. Can somebody explain the main idea in simple language? what does it mean to have a set of line with slope between -1 and 1 as a sub-gradient of the $l_1$norm?

angryavian · Accepted Answer · 2017-07-30T23:20:46.673

The subdifferential of $f:\mathbb{R}^n \to \mathbb{R}$ at some $x_0 \in \mathbb{R}^n$ is the set of all vectors $g \in \mathbb{R}^n$ (called subgradients) such that $$f(x_0) + \langle g,x - x_0 \rangle \le f(x), \forall x \in \mathbb{R}^n.$$

If $f$ were differentiable and convex, then the above would hold for $g=\nabla f(x_0)$, since the left-hand side would be the linear approximation of $f$ near $x_0$, which would lie below the function $f$. If you want to visualize this, in dimension $n=1$ this is saying that the tangent line of a convex function at some $x_0$ lies below the graph of the function. If $n=2$, then the left-hand side is the tangent plane that approximates $f$ at $x_0$, and lies below the graph of $f$.

When $f$ is not differentiable, there may be more than one such $g$ that works. For instance, consider $f(x)=|x|$, which is the $l^1$ norm in one dimension. Then the left-hand side is a line with slope $g \in \mathbb{R}$.

If $x_0>0$, the only such $g$ that satisfies the definition is $g=1$, which is the sign of $x$.
Similarly if $x_0<0$, the only such $g$ that satisfies the definition is $g=-1$, which is the sign of $x$.
However, if $x_0=0$, then any $g \in [-1,1]$ will work.

[To visualize the above bullet points, just try to draw a line that touches $f$ at $x_0$, but lies entirely below $f$. The valid slopes you can use are the elements of the subgradient.]

Thus, the subdifferential is $$\partial f(x_0) = \begin{cases} \{\text{sgn}(x_0)\} & x \ne 0 \\ [-1,1] & x = 0\end{cases}$$

A similar argument works for the $l^1$ norm in arbitrary dimension.

What you are calling the subgradient, I have often heard referred to as the subdifferential. The elements of the subdifferential are called subgradients. — littleO, Jul 30 '17 at 22:46

Understanding Subgradient?

1 Answers1