Relation between the number of parameters and the features in Gradient descent algorithm

Asked Jun 25 '23 at 02:23

Active Jun 25 '23 at 02:23

Viewed 32 times

My book describes this as an equation for minimizing the $\theta$ value, but I have a few questions regarding the intuition behind this equation:

The book describes $j$ as the number of features. If we have to compute the $\theta$ value for every $j$, does this mean that the number of features $\left(x_1,\:x_2,...\right)$ is equal to the number of parameters $\left(\theta _1,\:\theta _2,...\right)$?
How are the initial $\theta$ and $\alpha$ values selected? What if the initial values selected are too low/ too high?

If anyone could clear up my confusion, that would be great. Thanks.

asked Jun 25 '23 at 02:23

someman112

Which book are you using (please, also specify the page/chapter/section)? It might make sense to learn one parameter for each feature but we probably need more context – nbro Jun 26 '23 at 22:57
@nbro Course notes here: https://cs229.stanford.edu/lectures-spring2022/main_notes.pdf
please refer to page 10
– someman112 Jun 27 '23 at 00:42

0 Answers0