For $\|W^TW-I\|_*$ , $W\in R^{m\times n}$ with $n\leq m$ and $I\in R^{n\times n}$ is an identity matrix, $\|\|_*$ is nuclear norm, also named as trace norm.
Q1: $\|W^TW-I\|_*$ is usually used in machine learning algorithms as a regularization term. Minimizing such term can be done by calculating its (sub-)gradient with regard to the matrix variable $W$. Then how to do? I only know the (sub-)gradient of $\|W\|_*$ as answered in Derivative of the nuclear norm with respect to its argument.
Q2: Maybe Q2 will be much complicated. The (sub-)gradient of $\|W\|_*$ involves Singular Value Decomposition, and Is there a nuclear norm approximation for stochastic gradient descent optimization? provides some way to make it efficient. I am afraid the solution of Q1 will also be computationally expensive. Then any method to make it efficient?