0

enter image description here

My book describes this as an equation for minimizing the $\theta$ value, but I have a few questions regarding the intuition behind this equation:

  1. The book describes $j$ as the number of features. If we have to compute the $\theta$ value for every $j$, does this mean that the number of features $\left(x_1,\:x_2,...\right)$ is equal to the number of parameters $\left(\theta _1,\:\theta _2,...\right)$?

  2. How are the initial $\theta$ and $\alpha$ values selected? What if the initial values selected are too low/ too high?

If anyone could clear up my confusion, that would be great. Thanks.

  • Which book are you using (please, also specify the page/chapter/section)? It might make sense to learn one parameter for each feature but we probably need more context – nbro Jun 26 '23 at 22:57
  • @nbro Course notes here: https://cs229.stanford.edu/lectures-spring2022/main_notes.pdf

    please refer to page 10

    – someman112 Jun 27 '23 at 00:42

0 Answers0