Gradient Descent Proof in quadratic functions

Question

We have $f(x)=\frac{1}{2}x^tQx-b^tx$ is a quadratic function, where Q is symmetric and positive definite. The gradient $ \nabla f(x)=Qx-b $ and the minimum is $x^*$ and it is the unique solution of $Qx=b$.

If we have this iteration: $$x_{k+1}=x_k- \alpha g_k $$ where $ g_k=\nabla f(x_k)$, we can minimize $f(x_k-\alpha g_k)$ by differentiating and solving for when the gradient equals $0$, and we will know that the value of $\alpha$ is: $$ \alpha_k=\frac{g_k^tg_k}{g_k^tQg_k}$$ How can I prove this? Thanks in advance.

@user10559479 You can use the chain rule to explicitly compute the derivative of the real function $g(\alpha)=f(x_k+ \alpha g_k)$, in terms of the partial derivatives of $f$. — PierreCarre, Jan 08 '20 at 13:34

score 0 · Answer 1 · answered May 06 '22 at 12:16

\begin{align} f\big(\boldsymbol x^{k+1} \big) = f\big(\boldsymbol x^k - \alpha_k \boldsymbol g^{k}\big) &= \frac12 \big(\boldsymbol x^k - \alpha_k \boldsymbol g^{k}\big)^T Q \big(\boldsymbol x^k - \alpha_k \boldsymbol g^{k}\big) - \boldsymbol b^T \big(\boldsymbol x^k - \alpha_k \boldsymbol g^{k}\big) \\ &= \frac12 \big(\boldsymbol x^k\big)^T Q \boldsymbol x^k + \frac12 \alpha_k^2 \big(\boldsymbol g^{k}\big)^T Q \boldsymbol g^{k} - \alpha_k \big( \boldsymbol g^k \big)^T Q \boldsymbol x^{k} - \boldsymbol b^T \boldsymbol x^k + \alpha_k\boldsymbol b^T \boldsymbol g^{k} \\ &= f\big(\boldsymbol x^k \big) + \frac12 \alpha_k^2 \big(\boldsymbol g^{k}\big)^T Q \boldsymbol g^{k}- \alpha_k \big( \boldsymbol g^k \big)^T Q \boldsymbol x^{k} + \alpha_k \big( \boldsymbol g^{k}\big)^T \boldsymbol b\\ &= f\big(\boldsymbol x^k \big) + \frac12 \alpha_k^2 \big(\boldsymbol g^{k}\big)^T Q \boldsymbol g^{k}- \alpha_k \big( \boldsymbol g^k \big)^T \Big[Q \boldsymbol x^{k} - \boldsymbol b\Big] =: h(\alpha_k). \end{align}

Assuming a fixed $\boldsymbol x^{k}$, $h(\alpha_k) $ is essentially a univariate, scalar parabola with unique minimum (recall that this is what we want) at $h'(\alpha_k) = 0$. Performing the derivative, one obtains

\begin{align} 0 &\overset{!}{=} \alpha_k \big( \boldsymbol g^k \big)^T Q \boldsymbol g^{k}- \big( \boldsymbol g^k \big)^T \Big[Q \boldsymbol x^{k} - \boldsymbol b\Big] \\ \Rightarrow \alpha_k &= \frac{ \big( \boldsymbol g^k \big)^T \Big[Q \boldsymbol x^{k} - \boldsymbol b\Big]}{\big( \boldsymbol g^k \big)^T Q \boldsymbol g^{k}}.\end{align} This assumes that $\boldsymbol g^k \neq \boldsymbol 0$, i.e., we are not yet at the minimum and thus the denominator is due to $Q$ being s.p.d. nonzero.

By definition of $f\big(\boldsymbol x^k\big)$, $$\nabla f\big(\boldsymbol x^k\big) = Q \boldsymbol x^k - \boldsymbol b$$ and thus \begin{align} \alpha_k &= \frac{ \big( \boldsymbol g^k \big)^T \boldsymbol g^k}{\big( \boldsymbol g^k \big)^T Q \boldsymbol g^{k}}\end{align} as desired.

Gradient Descent Proof in quadratic functions

1 Answers1