2

I have seen in some literature that the derivative of the $l_1$-norm is represented by $sgn (.)$ function. I know , in general , the $l_1$-norm is not differentiable and therefore talking about gradient doesn't make sense. In fact, what we are implying is sub-gradient. I read allitle about the sub-gradient, but still confused. Can somebody explain the main idea in simple language? what does it mean to have a set of line with slope between -1 and 1 as a sub-gradient of the $l_1$norm?

fery
  • 94

1 Answers1

2

The subdifferential of $f:\mathbb{R}^n \to \mathbb{R}$ at some $x_0 \in \mathbb{R}^n$ is the set of all vectors $g \in \mathbb{R}^n$ (called subgradients) such that $$f(x_0) + \langle g,x - x_0 \rangle \le f(x), \forall x \in \mathbb{R}^n.$$

If $f$ were differentiable and convex, then the above would hold for $g=\nabla f(x_0)$, since the left-hand side would be the linear approximation of $f$ near $x_0$, which would lie below the function $f$. If you want to visualize this, in dimension $n=1$ this is saying that the tangent line of a convex function at some $x_0$ lies below the graph of the function. If $n=2$, then the left-hand side is the tangent plane that approximates $f$ at $x_0$, and lies below the graph of $f$.

When $f$ is not differentiable, there may be more than one such $g$ that works. For instance, consider $f(x)=|x|$, which is the $l^1$ norm in one dimension. Then the left-hand side is a line with slope $g \in \mathbb{R}$.

  • If $x_0>0$, the only such $g$ that satisfies the definition is $g=1$, which is the sign of $x$.
  • Similarly if $x_0<0$, the only such $g$ that satisfies the definition is $g=-1$, which is the sign of $x$.
  • However, if $x_0=0$, then any $g \in [-1,1]$ will work.

[To visualize the above bullet points, just try to draw a line that touches $f$ at $x_0$, but lies entirely below $f$. The valid slopes you can use are the elements of the subgradient.]

Thus, the subdifferential is $$\partial f(x_0) = \begin{cases} \{\text{sgn}(x_0)\} & x \ne 0 \\ [-1,1] & x = 0\end{cases}$$

A similar argument works for the $l^1$ norm in arbitrary dimension.

angryavian
  • 89,882