Training Examples used in Stochastic Gradient Descent

Question

Hi I was reading the difference between GD and SGD and found the below link.

[What is the difference between Gradient Descent and Stochastic Gradient Descent?

Based on this information I wanted to understand how would SGD train in the below scenario :

Say we have a dataset having 10000 rows and 45 predictors. Now, since SGD trains each predictor on one example(1 from 10000) , does that mean it uses only 45 examples in total to train the 45 predictors?

I'd really appreciate a clear explanation of this scenario.

Thanks!

gs.co · Answer 1 · 2019-01-23T21:59:51.937

2

In a given iteration of the stochastic gradient descent algorithm, all 45 predictors are updated using a randomly generated subset of your 10,000 observation sample. This subset may consist of only 1 observation, but typically cross validation is used to determine the optimal subset size. You could even try randomly generating different subset sizes each iteration.

edited Jan 23 '19 at 21:59

answered Jan 23 '19 at 21:32

gs.co

41
5

Amit Rastogi · Answer 2 · 2019-01-23T20:21:47.603

0

Each stochastic gradient descent step would update the 45 model parameters using one training example.

Mathematically the gradient step in SGD can be represented as-

References-

https://en.m.wikipedia.org/wiki/Stochastic_gradient_descent

edited Jan 23 '19 at 20:21

answered Jan 23 '19 at 19:41

Amit Rastogi

204
1
4

Training Examples used in Stochastic Gradient Descent

2 Answers2