1

Hi I was reading the difference between GD and SGD and found the below link.

[What is the difference between Gradient Descent and Stochastic Gradient Descent?

Based on this information I wanted to understand how would SGD train in the below scenario :

Say we have a dataset having 10000 rows and 45 predictors. Now, since SGD trains each predictor on one example(1 from 10000) , does that mean it uses only 45 examples in total to train the 45 predictors?

I'd really appreciate a clear explanation of this scenario.

Thanks!

2 Answers2

2

In a given iteration of the stochastic gradient descent algorithm, all 45 predictors are updated using a randomly generated subset of your 10,000 observation sample. This subset may consist of only 1 observation, but typically cross validation is used to determine the optimal subset size. You could even try randomly generating different subset sizes each iteration.

gs.co
  • 41
  • 5
0

Each stochastic gradient descent step would update the 45 model parameters using one training example.

Mathematically the gradient step in SGD can be represented as-

enter image description here

References-

https://en.m.wikipedia.org/wiki/Stochastic_gradient_descent

Amit Rastogi
  • 204
  • 1
  • 4