As written in the title, I have a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ and I want to find the direction of least change. However, I encountered an inconsistency between two different methods which I will outline below.
I start with the directional derivative as given by this answer: $(\nabla f)\cdot v$. This will give the rate of change along any direction $v = \sum_i\alpha_i\vec{e}_i$ with $\sum_i\alpha_i^2=1$. To obtain the coefficients of the vector corresponding to the least change, I want to minimize the square of that expression: $$ \min_{\{\alpha_i\}} \left[(\nabla f)\cdot v\right]^2 $$
This is similar to the following quadratic problem: $$ \min_{v} \quad v^T \cdot \underbrace{ \begin{pmatrix} \partial_1\partial_1 & \partial_1\partial_2 & \cdots & \partial_1\partial_n \\ \partial_2\partial_1 & \partial_2\partial_2 & \cdots & \partial_2\partial_n \\ \vdots & \vdots & \ddots & \vdots \\ \partial_n\partial_1 & \partial_n\partial_2 & \cdots & \partial_n\partial_n \\ \end{pmatrix} }_{\equiv A} \cdot v \qquad\mathrm{subject}\;\mathrm{to}\quad \lVert v \rVert = 1 $$ where $\partial_i$ is the $i$-th partial derivative evaluated at some reference point.
In order to solve this, I am following the paper William W. Hager, "Minimizing a Quadratic Over a Sphere" (full text available here). Relevant are Lemma 2.1 and 2.2. In summary, the result is that the direction of least change is given by the eigenvector which corresponds to the smallest eigenvalue of $A$. However, it doesn't claim that the smallest eigenvalue must be positive and, in fact, Lemma 2.1 states that only $A - \lambda_1 I$ be positive-semidefinite ($\lambda_1$ being the smallest eigenvalue and $I$ the identity matrix).
The above findings are in agreement with this answer. That answer links to a Wikipedia article about the Structure Tensor, which seems to be similar to the above matrix $A$ when the window function $w(r)$ is the Dirac delta function. This article, however, states that all eigenvalues of $A$ should be positive (see here; this is also mentioned in this paper, paragraph following eq.(16)). Also the 3D interpretation with an ellipsoid seems to require $\lambda_i\geq 0$ as they are associated with the semi-axes of the ellipsoid. This would imply that $A$ is already positive-semidefinite which seems to conflict with the statement in the above paper.
When I apply the above method, i.e. diagonalizing $A$, to a specific problem ($n=36$), then I find indeed that some of the eigenvalues are negative; $A - \lambda_1 I$ on the other hand is positive-semidefinite. So I'm wondering whether this method is correct or if there is something I'm missing? Does it have to do with the dimensionality of the problem (in my specific case $n=36$)?
The second thing I'm wondering about is that the rate of change is zero if $v$ is chosen perpendicular to $\nabla f$. For $n\geq 3$ there exist infinitely many vectors $v$ that are perpendicular to any other given vector, so this seems to contradict the above finding that the direction of least change is precisely given by one of the eigenvectors.
Numerical example
The following are the 36 elements of the gradient, separated by commas, for my specific problem (it's a complex physics simulation, hence I'm just including the gradient here):
-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01
I use SymPy, specifically Matrix.diagonalize
, in order to obtained the eigenvalues and eigenvectors:
import numpy as np
from sympy import Matrix
gradient = np.array([-4.629630012686902774e+01,-1.625366508051229175e+01,-3.641531520770513453e+00,3.641531520770513453e+01,1.354472090042690979e+01,2.953193245502916398e+00,2.629008122312370688e+01,1.239008895481674699e+01,5.195843755245732609e+00,-2.098321516541545861e+01,-8.260059303211164661e+00,-1.776356839400250465e+00,1.050270981295398087e+01,1.243449787580175325e+00,-6.439293542825907934e-01,4.269917752708352054e+01,1.769695501252499525e+01,5.773159728050814010e+00,-2.420286193682841258e+00,1.931788062847772380e+00,1.509903313490212895e+00,-1.521005543736464460e+01,-7.527312106958561344e+00,-2.908784324517910136e+00,3.606004383982508443e+01,1.207922650792170316e+01,2.686739719592878828e+00,2.680078381445127889e+01,1.294520046712932526e+01,4.773959005888173124e+00,-2.051692149507289287e+01,-8.038014698286133353e+00,-1.731947918415244203e+00,9.325873406851314940e+00,1.620925615952728549e+00,-5.995204332975845318e-01])
A = Matrix(np.multiply.outer(gradient, gradient))
P, D = A.diagonalize(reals_only=True, sort=True, normalize=True)
print('----- Eigenvalues -----')
print(np.diag(D), end='\n\n')
which gives the following output:
----- Eigenvalues -----
[-2.12609948489760e-13 -1.52827730730385e-13 -1.00673222408694e-13
-7.83376902220698e-14 -5.00864285330806e-14 -4.16036346697355e-14
-3.07234097627031e-14 -2.42707863964688e-14 -2.00785824570343e-14
-1.15863543324252e-14 -1.00286213791179e-14 -4.01745072568617e-15
-1.65034450852433e-15 -8.84471783248045e-16 -7.34601713232325e-16
-3.49931683094853e-16 -1.87968744502435e-16 -3.39521826408545e-17
2.39349396454803e-78 9.63838283004546e-17 4.83310171616047e-16
5.70723316062587e-16 1.21622134384129e-15 1.67127258435284e-15
2.52716079698100e-15 3.56914743743705e-15 6.33591850166460e-15
9.44400674233662e-15 2.04608434272058e-14 2.93396570040894e-14
4.82240672003627e-14 5.21039117277875e-14 7.82644482744230e-14
1.11237991654078e-13 2.17480897644168e-13 10853.3563960944]
The first 35 eigenvalues have very small magnitude (compared to the 36th one), some of them are negative and others positive.