2

While coding the batch gradient descent, it is easy to code the convergence as after each iterations the cost moves towards minimum and when the change in cost tends to approach a pre-defined number, we stop the iterations and conclude our gradient descent has converged. But in stochastic GD, the cost tend to wander off the local mimina at one point of time. Deciding a threshold of change does'nt work here as SGD does not always move to converge, rather fluctuates a lot. While coding in python, how do I know the number of iterations when the cost tends to be at minimum?

Peter
  • 7,446
  • 5
  • 19
  • 49
  • if you are updating weights after each training sample. you can calculate the cost as the average cost of the training samples in each epoch in order to determine if the algorithm converged. – cap Nov 27 '19 at 20:37
  • With SGD, do you mean pure Stochastic GD (when you feed one observation at a time), or mini batch GD (when you feed a batch of data of size n at each iteration)? I asked because different sources use these two labels interchangeably, but their interpretations are very different. – Leevo Jan 02 '20 at 08:59

1 Answers1

0

Given the slight fluctuations in stochastic gradient descent, either average over several recent runs or set an epsilon/error value for minimal improvement.

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109