2

If $M$ is a $n\times n$ matrix, let $f(M)$ denote the largest eigenvalue (in absolute value) of $M$. In other words, if $\lambda_1,\dots,\lambda_n$ are the eigenvalues of $M$, define

$$f(M) = \max(|\lambda_1|,\dots,|\lambda_n|).$$

This can be viewed as a function $f:\mathbb{R}^{n^2} \to \mathbb{R}$ on a $n^2$-dimensional input.

Now given a matrix $M$, I'd like to compute the gradient $\nabla f(M)$. How do I do that?

Equivalently, for each $i,j$, I want to compute the derivative ${\partial \over \partial M_{i,j}} f(M)$ of $f(M)$ with respect to the $i,j$-th entry of the matrix. I can't figure out a clean way to compute this, as computing the eigenvalues involves Gaussian elimination, and it's not clear how to differentiate through that process.

(This is based on application where I want to do gradient descent on a function with a term of the form $f(M)$, so I need to be able to compute the gradient to do that.)

D.W.
  • 4,540
  • I suggest approximating the gradient numerically. – quasi Mar 30 '18 at 18:44
  • You might need simplicity of the largest eigenvalue for differentiability. – Surb Mar 30 '18 at 18:47
  • 1
    Are you certain that the largest eigenvalue is a differentiable function of the matrix entries? – John Hughes Mar 30 '18 at 18:48
  • 1
    Related (for symmetric $M$), https://math.stackexchange.com/questions/929434/derivative-of-spectral-norm-of-symmetric-matrix – Surb Mar 30 '18 at 18:51
  • @JohnHughes, in general, no -- but I am happy to restrict attention to cases where the function $f$ is differentiable at $M$ (or to accept a subgradient if it is not, but maybe that's extra complexity). – D.W. Mar 30 '18 at 19:08
  • @quasi, approximating the gradient numerically is possible but expensive ($\sim O(n^3)$ time to compute the eigenvalues, multiplied by $n^2$ entries of the matrix, for a total of $O(n^5)$ time to compute the approximate gradient). I was hoping there might be another way to do it analytically/directly. – D.W. Mar 30 '18 at 21:59
  • Related: the corresponding calculation, for eigenvectors: https://math.stackexchange.com/q/1305122/14578. – D.W. Apr 30 '18 at 04:32

2 Answers2

2

It appears the solution is to find the largest eigenvalue, say $\lambda_1$, and find the corresponding eigenvector, call it $v_1$. Then, the derivative ${\partial f \over \partial M_{i,j}}(M)$ is given by

$${\partial f \over \partial M_{i,j}}(M) = {\partial \lambda_1 \over \partial M_{i,j}} = (v_1 \cdot b_i) (v_1 \cdot b_j) = (v_1)_i \cdot (v_1)_j,$$

where I have used a formula for the derivative of a eigenvalue taken from Derivatives of eigenvalues. Here $b_i$ represents the basis vector with a 1 in the $i$th position and a 0 elsewhere.

Thus, once we have computed the largest eigenvalue and its corresponding eigenvector, we can easily fill in the entries of the gradient $\nabla f(M)$ as above.

D.W.
  • 4,540
1

That you write is known. On the other hand, at each step, you must reset $\lambda$ and $v$. To do that, store the precedent value of $v$ and use the power method (normally $M$ has small variations during each iteration). Very few iterations $v:=Mv/||Mv||$ suffice in general (for example $2$ and each iteration has complexity $O(n^2)$); note that if the convergence is slow, then there is another eigenvalue whose modulus is close to that of $\lambda$; in such a case, $\lambda_{max}$ may not be differentiable. Beware, your formula gives the derivative of $\lambda$ and not the one of $|\lambda|$, especially when $\lambda$ is complex.

  • Wonderful! You even answered the question I should have asked but didn't think to, about how to update this in each iteration of gradient descent. Thank you -- this is great stuff, I appreciate it. – D.W. Mar 31 '18 at 15:57