How to prove the convergence of the SGD algorithm?

Question

As we all know, the iterative process of the SGD algorithm is：

$x^{k}=x^{k-1}-\alpha_{k}\nabla f_{ik}(x^{k-1})$

And we let $f(x)=\frac{1}{N}\sum_{i=1}^{N} f_{i}(x)$, where each $f_{i}(x)$ is a differentiable function, and $f(x)$ is the gradient L-Lipschitz continuous. {$x_{k}$} is the iterative sequence generated by the stochastic gradient descent method, $s_{k}$ is the randomly selected subscripts in $k_{th}$kth step. And I want to prove:

$\mathbb{E}\left[\left\|\nabla f_{s_{k}}\left(x^{k}\right)\right\|^{2}\right] \leqslant \mathbb{E}\left[\left\|x^{k}-x^{*}\right\|^{2}\right]+\alpha_{k} \mathbb{E}\left[\left\|\nabla f_{s_{k}}\left(x^{k}\right)-\nabla f\left(x^{k}\right)\right\|^{2}\right]$

Where $x^{*}$ is a minimum point of $f(x)$, and $\alpha_{k}$ is the step size of the k-th step.

"As we all know,", maybe in your sphere... I happen to know a little Gradient Descent method(s) but you should give more context, helping for example to understand the dimensionalities of the spaces, the mysterious double indexing in $f_{ik}$, etc. — Jean Marie, May 30 '21 at 12:31

How to prove the convergence of the SGD algorithm?

0 Answers0