3

I know that Leven-Marquardt method is widely used in practice. I want to prove to myself that this method always picks a direction of strict descent, provided the jacobian is of full rank (i.e. the function is differentiable at $x$)

so, I went about trying to prove that the following optimization problem always results in a descent direction of the original objective function.

Say the original objective function we wish to minimize is $$ \min_{x} \frac{1}{2}||f(x)||^2$$. and say $g(x) = \frac{1}{2}||f(x)||^2$ Leven-Marquardt solves the following minimization problem:

$$ \min_{d} \frac{1}{2}||f(x)+J_{f(x)}^T d||^2+\frac{1}{2}r||d||^2$$

The gradient of the original objective function is $f(x)$. And the gradient of the problem for finding the search direction is, if we say $$F(x) =\frac{1}{2}||f(x)+J_{f(x)}^T d||^2+\frac{1}{2}r||d||^2$$ then $$\nabla F(x)= f(x)^T J_{f(x)} +d^T J_{g(x)}^T J_{g(x)} + \frac{r}{2} d^T$$ and since $$ J_{g(x)} = f(x)^T J_{f(x)}$$ then we can replace that and get $$\nabla F(x)= J_{g(x)} +d^T J_{g(x)}^T J_{g(x)} + \frac{r}{2} d^T $$ Then we can multiply through by $d$ to get $$\nabla F(x)^T d = J_{g(x)}^T d +d^T d J_{g(x)}^T J_{g(x)} + \frac{r}{2} d^T d $$ If we are not yet at an optimal value for the original problem, then $$\nabla F(x)^T d \lt 0 \forall d$$ because the FONC of optimality have not been satisfied yet.

breaking this down term by term then

$d^T d J_{g(x)}^T J_{g(x)}\gt 0 $ because they are all squared terms and $ \frac{r}{2} d^T d \gt 0 $ because it is all squared

So then $$ J_{g(x)}^T d \lt 0 $$. Hence $d$ will always point in a descent direction.

Is this proof correct? thanks.

related: How to introduce Levenberg-Marquardt?

makansij
  • 1,583

1 Answers1

2

You got several things wrong. (First, the method is called Levenberg-Marquardt, but that's probably not the point here.)

If the original objective is $G(x) = \tfrac12\|f(x)\|^2$, then its gradient is not $f(x)$, but $$ \nabla G(x) = J_{f(x)}^T f(x). $$ The auxiliary problem for the search direction is a minimization problem over $d$, not over $x$ and should read as $$ \min_d \tfrac12\|f(x) + J_{f(x)} d\|^2 + \tfrac{r}{2}\|d\|^2. $$ Taking the gradient of this objective (with respect to $d$, not to $x$!) gives $$ J_{f(x)}^T(f(x) + J_{f(x)} d) + rd = 0 $$ i.e. $$ (J_{f(x)}^T J_{f(x)} + rI) d = -J_{f(x)}^T f(x). $$ Now we want to show $\langle\nabla G(x),d\rangle<0$: Using the last expression we get $$ \begin{split} \langle\nabla G(x),d\rangle & = \langle J_{f(x)}^T f(x),d\rangle\\ & = -\langle (J_{f(x)}^T J_{f(x)} + rI) d,d\rangle\\ & = -\|J_{f(x)}d\|^2 - r\|d\|^2 < 0 \end{split} $$ as desired.

Dirk
  • 11,680