For a given function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, find the direction of least change

Question

As written in the title, I have a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and I want to find the direction of least change. However, I encountered an inconsistency between two different methods which I will outline below.

I start with the directional derivative as given by this answer: $(\nabla f)\cdot v$. This will give the rate of change along any direction $v = \sum_i\alpha_i\vec{e}_i$ with $\sum_i\alpha_i^2=1$. To obtain the coefficients of the vector corresponding to the least change, I want to minimize the square of that expression: $$ \min_{\{\alpha_i\}} \left[(\nabla f)\cdot v\right]^2 $$

This is similar to the following quadratic problem: $$ \min_{v} \quad v^T \cdot \underbrace{ \begin{pmatrix} \partial_1\partial_1 & \partial_1\partial_2 & \cdots & \partial_1\partial_n \\ \partial_2\partial_1 & \partial_2\partial_2 & \cdots & \partial_2\partial_n \\ \vdots & \vdots & \ddots & \vdots \\ \partial_n\partial_1 & \partial_n\partial_2 & \cdots & \partial_n\partial_n \\ \end{pmatrix} }_{\equiv A} \cdot v \qquad\mathrm{subject}\;\mathrm{to}\quad \lVert v \rVert = 1 $$ where $\partial_i$ is the $i$-th partial derivative evaluated at some reference point.

In order to solve this, I am following the paper William W. Hager, "Minimizing a Quadratic Over a Sphere" (full text available here). Relevant are Lemma 2.1 and 2.2. In summary, the result is that the direction of least change is given by the eigenvector which corresponds to the smallest eigenvalue of $A$. However, it doesn't claim that the smallest eigenvalue must be positive and, in fact, Lemma 2.1 states that only $A - \lambda_1 I$ be positive-semidefinite ($\lambda_1$ being the smallest eigenvalue and $I$ the identity matrix).

The above findings are in agreement with this answer. That answer links to a Wikipedia article about the Structure Tensor, which seems to be similar to the above matrix $A$ when the window function $w(r)$ is the Dirac delta function. This article, however, states that all eigenvalues of $A$ should be positive (see here; this is also mentioned in this paper, paragraph following eq.(16)). Also the 3D interpretation with an ellipsoid seems to require $\lambda_i\geq 0$ as they are associated with the semi-axes of the ellipsoid. This would imply that $A$ is already positive-semidefinite which seems to conflict with the statement in the above paper.

When I apply the above method, i.e. diagonalizing $A$, to a specific problem ($n=36$), then I find indeed that some of the eigenvalues are negative; $A - \lambda_1 I$ on the other hand is positive-semidefinite. So I'm wondering whether this method is correct or if there is something I'm missing? Does it have to do with the dimensionality of the problem (in my specific case $n=36$)?

The second thing I'm wondering about is that the rate of change is zero if $v$ is chosen perpendicular to $\nabla f$. For $n\geq 3$ there exist infinitely many vectors $v$ that are perpendicular to any other given vector, so this seems to contradict the above finding that the direction of least change is precisely given by one of the eigenvectors.

Numerical example

The following are the 36 elements of the gradient, separated by commas, for my specific problem (it's a complex physics simulation, hence I'm just including the gradient here):

-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01

I use SymPy, specifically Matrix.diagonalize, in order to obtained the eigenvalues and eigenvectors:

import numpy as np
from sympy import Matrix
gradient = np.array([-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01])
A = Matrix(np.multiply.outer(gradient, gradient))
P, D = A.diagonalize(reals_only=True, sort=True, normalize=True)
print('----- Eigenvalues -----')
print(np.diag(D), end='\n\n')

which gives the following output:

----- Eigenvalues -----
[-2.12609948489760e-13 -1.52827730730385e-13 -1.00673222408694e-13
 -7.83376902220698e-14 -5.00864285330806e-14 -4.16036346697355e-14
 -3.07234097627031e-14 -2.42707863964688e-14 -2.00785824570343e-14
 -1.15863543324252e-14 -1.00286213791179e-14 -4.01745072568617e-15
 -1.65034450852433e-15 -8.84471783248045e-16 -7.34601713232325e-16
 -3.49931683094853e-16 -1.87968744502435e-16 -3.39521826408545e-17
 2.39349396454803e-78 9.63838283004546e-17 4.83310171616047e-16
 5.70723316062587e-16 1.21622134384129e-15 1.67127258435284e-15
 2.52716079698100e-15 3.56914743743705e-15 6.33591850166460e-15
 9.44400674233662e-15 2.04608434272058e-14 2.93396570040894e-14
 4.82240672003627e-14 5.21039117277875e-14 7.82644482744230e-14
 1.11237991654078e-13 2.17480897644168e-13 10853.3563960944]

The first 35 eigenvalues have very small magnitude (compared to the 36th one), some of them are negative and others positive.

Rather than looking at the square, you could simply note that $\nabla f \cdot v = 0$ is a linear system of equations on the entries of $v$. — Ben Grossmann, Jan 20 '22 at 16:55
@BenGrossmann Right, that's the other inconsistency which I forgot to include in my original question. $\nabla f\cdot v = 0$ implies that any vector $v$ which is perpendicular to the gradient is a direction of zero change, i.e. the minimum change possible. For $n \geq 3$ there are infinitely many ways to choose such a vector which seems to contradict the other solution where the direction of least change is given by one of the eigenvectors (i.e. one specific direction). — a_guest, Jan 20 '22 at 19:11
Ultimately, the question comes down to what you mean by "the rate of change along a direction". If all you're looking for is a direction $v$ such that we have $$ \frac{d}{dt} f(x + tv) = \lim_{t \to 0} \frac{f(x + tv) - f(x)}{t}= 0, $$ this holds iff $v \cdot \nabla f(x) = 0$. Hager's approach seems to consider a "second order" rate of change. That is, I believe that the goal in that context is to find a $v$ such that $$ \lim_{t \to 0}\frac{f(x + tv) - f(x)}{t^2} $$ is minimized. Note that in that paper, "minimum" does not refer to minimum absolute value. — Ben Grossmann, Jan 20 '22 at 22:36
Normally, this "second order rate" is only considered when $\nabla f = 0$. — Ben Grossmann, Jan 20 '22 at 22:39
Actually, the paper that you linked has no explicit connection to calculus; the goal seems to be merely to minimize $x^TAx - 2b^Tx$ without any notion of looking for a "direction of least change". — Ben Grossmann, Jan 20 '22 at 22:44
@BenGrossmann Right, the paper only presents a general purpose method for minimizing $x^TAx-2b^Tx$ subject to $\lVert x\rVert =1$. I, on the other hand, have a specific use case where $A$ is the above mentioned matrix of partial derivatives and $b=0$. So I don't see why this method would not be applicable to my use case. However, the results are different to what I would expect from perpendicularity. — a_guest, Jan 22 '22 at 21:06
In fact, the paper mentions that the solution is a linear combination of eigenvectors which correspond to the same smallest eigenvalue, i.e. if $\lambda_1 = \lambda_2 = \dots = \lambda_{n-1} < \lambda_n$ for this specific type of matrix then the solutions would still span a $(n-1)$-dimensional subspace, but I don't see why that would be the case and for my specific use case, when I compute the eigenvalues, then $\lambda_1 < \lambda_2 < \dots < \lambda_n$, i.e. there is no multiplicity in the smallest eigenvalue. — a_guest, Jan 22 '22 at 21:08
Can you give the specific example (or at least the gradient of $f$ at $x$) that gives negative eigenvalues? It doesn't seem possible to me that you'd get negative eigenvalues of $A$. — V.S.e.H., Jan 25 '22 at 20:24
@V.S.e.H. Please see my updated question. I've included the gradient as well as the code which I use to compute the eigenvalues. Taking another look at the eigenvalues now makes me wondering if this is perhaps a numerical issue (i.e. limited floating point precision)? As you can see in the update, the first 35 eigenvalues range from $-10^{-13}$ to $10^{-13}$ while the 36th eigenvalue is of the order $10^{4}$. Perhaps the first 35 eigenvalues should actually have the same value? But then, can it be proven that for such a matrix $A$, we have $\lambda_1=\lambda_2=\dots=\lambda_{n-1}<\lambda_n$? — a_guest, Jan 25 '22 at 20:50

score 3 · Accepted Answer · answered Jan 25 '22 at 20:55

Your matrix $A$ is an outer product of the gradient vector, and therefore a rank-1 PSD matrix. As given in this answer - https://math.stackexchange.com/a/55166/443030 - $A$ has an eigenvalue $0$ of multiplicity $n-1$, and only one eigenvalue equal to $\lVert \nabla f(x) \rVert^2$. The negative values associated with your matrix are in fact numerical inaccuracies, and can be assumed zero given the magnitude of the maximum eigenvalue.

As stated in the paper, the minimizer $v$ would be an eigenvector corresponding to the smallest eigenvalue of $A$, which in this case is $0$. Note, also, that the eigenvector corresponding to the only nonzero eigenvalue is $\nabla f(x)$, which is the direction of maximum change (in fact, any vector that is scalar multiple of the gradient points in the direction of most change). So, this is consistent to the fact that we would expect the least change to occur in the direction of all vectors in the nullspace of the gradient, which is spanned by all the eigenvectors corresponding to the eigenvalue $0$.

Thanks, then it makes all sense, since the paper specifies that the solution is any linear combination with norm 1 of all eigenvectors that correspond to the smallest eigenvalue which also spans a $(n-1)$-dimensional subspace. — a_guest, Jan 25 '22 at 21:08

For a given function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, find the direction of least change

Numerical example

1 Answers1