Hi I was reading the difference between GD and SGD and found the below link.
[What is the difference between Gradient Descent and Stochastic Gradient Descent?
Based on this information I wanted to understand how would SGD train in the below scenario :
Say we have a dataset having 10000 rows and 45 predictors. Now, since SGD trains each predictor on one example(1 from 10000) , does that mean it uses only 45 examples in total to train the 45 predictors?
I'd really appreciate a clear explanation of this scenario.
Thanks!