I was studying the Stephen Boyd's textbook on convex optimization. It says the following:
The amplitude distribution of the optimal residual for the l1-norm approximation problem will tend to have more zero and very small residuals , compared to the l2-norm approximation solution. In contrast, the l2-norm solution will tend to have relatively fewer large residuals (since large residuals incur a much larger penalty in l2-norm approximation than in l1-norm approximation).
I understand why the second sentence holds -- obviously, l2-norm places a higher penalty on a higher residual and hence would fewer higher residuals. But, I can't understand the first sentence. l1-norm places a higher penalty on the residuals between 0 and 1 than l2-norm and hence it seems to me that l2-norm should yield more small residuals. Can anybody explain to me why l1-norm generates more small residuals than l2 norm?
In fact, the two statements sounds contradictory to each other. If L2-norm generates fewer large residuals, it sounds like it generates more small residuals than L1-norm.