Gradient of the elastic net with extra terms

Question

Can anyone tell me the gradient of the below function (w.r.to X)

$$ argmin_{X} ~~\frac{\lambda}{2}\lVert X\lVert_2^2 + \lVert X\lVert_1 + tr\bigg(\Delta^T\Big(diag(X)-X\Big)\bigg) + \frac{\rho}{2}\lVert K+diag(X)-X\lVert_2^2 $$

I am able to find the gradient for all terms except for the $\ell_1$ term. In fact, I am trying to find the value of $X$, where the gradient equals to zero (minima).

So far my result is this

$$ \lambda X + diag(\Delta)-X + \rho\Big(diag(K)-K+X\Big) + \nabla(\lVert X\lVert_1) $$

score 2 · Accepted Answer · edited Apr 13 '17 at 12:21

I don't think your "result so far" is entirely accurate.

Defining $\mathbb I=$ (matrix of all ones), $\,E=({\mathbb I}-I)$, and using the Hadamard ($\circ$) product, we can use $(-E\circ X)$ to replace those bulky $({\rm diag}(X)-X)$ terms.

Ignoring the L1-term, and using the Frobenius (:) product, the function to be minimized is $$ f = \frac{\lambda}{2}(X):(X)-(\Delta):(E\circ X) + \frac{\rho}{2}(K-E\circ X):(K-E\circ X) $$ The differential of which is $$\eqalign{ df &= (\lambda X):(dX)-(\Delta):(E\circ dX)-\rho(K-E\circ X):(E\circ dX) \cr &= (\lambda X):(dX)-(\Delta\circ E):(dX)-\rho(E\circ K-E\circ E\circ X):(dX) \cr &= [\lambda X-\Delta\circ E-\rho E\circ(K-X)]:(dX) \cr }$$ Since $df=\frac{\partial f}{\partial X}:dX,\,$ the derivative must be $$\eqalign{ \frac{\partial f}{\partial X} &= \lambda X-E\circ\Delta-\rho E\circ(K-X) \cr &= \lambda X+({\rm diag}(\Delta)-\Delta)+\rho({\rm diag}(K)-K+X-{\rm diag}(X))\cr }$$ Which is different from your result -- unless you're utilizing special properties of {$X,\Delta$} which you forgot to tell us about, in order to simplify the result.

Update #1

Since you're using $\|X\|_2$ to denote the Frobenius norm, I assume you are using Schatten norms rather than induced norms.

In that case, $\|X\|_1$ denotes the Nuclear norm, for which the derivative is known. $$ \eqalign{ \frac{\partial\,\|X\|_{*}}{\partial X} &= X(X^TX)^{-1/2} \cr }$$

Update #2

Upon further reflection, I suppose you could be using entrywise norms, in which case $\|X\|_2$ still denotes the Frobenius norm, but $\|X\|_1$ denotes the Manhattan norm. $$ \eqalign{ \|X\|_1 &= {\mathbb I}:{\rm abs}(X) \cr d\,\|X\|_1 &= {\mathbb I}:d({\rm abs}(X)) \cr &= {\mathbb I}:({\rm sign}(X)\circ dX) \cr &= ({\mathbb I}\circ{\rm sign}(X)):dX \cr &= {\rm sign}(X):dX \cr \frac{\partial\,\|X\|_1}{\partial X} &= {\rm sign}(X) \cr }$$ where the {${\rm abs},{\rm sign}$} functions are applied entrywise as well.

I removed part of my initial answer involving the L1-norm, since it was, mistakenly, a result about the L0-norm. — lynn, May 16 '15 at 22:19

Gradient of the elastic net with extra terms

1 Answers1