1

On the lecture notes here: http://web.stanford.edu/class/archive/ee/ee263/ee263.1082/lectures/ls.pdf

the derivation for least squares is as follows.

$x^\intercal A^\intercal Ax - 2 y^\intercal Ax + y^\intercal y $

set gradient w.r.t. to x to 0

$2A^\intercal Ax - 2A^\intercal y = 0$.

How did this step come about? Does gradient mean derivative here? And what are some good resources for learning matrix calculus (more than Wikipedia, I want to develop intuition)? Thanks!

  • Are you asking what the definition of "gradient" is? – 5xum Jan 22 '16 at 07:41
  • Yes, sorry. I think I know what it means, but I'm not sure. I guess my question is partially answered here: http://math.stackexchange.com/questions/369694/matrix-calculus-in-least-square-method?rq=1. My biggest question is how I go about learning these techniques. – Michael Zhang Jan 22 '16 at 07:47
  • Learn multivariable calculus. – Algebraic Pavel Jan 22 '16 at 07:48
  • If you only think you know what a gradient is, then you should first to a course that covers gradients (which would be something like analysis 2) – 5xum Jan 22 '16 at 07:49
  • Any recommended textbooks/resources? – Michael Zhang Jan 22 '16 at 07:51
  • Any introduction to multivariable calculus should do. But I would recommend actually taking a course. – 5xum Jan 22 '16 at 08:05

1 Answers1

1

The gradient is the generalisation of the first derivative for a function of multiple variables. It is a vector whose component in the $i$-th coordinate direction is the $i$-th partial derivative.

In one variable, if a differentiable function has an extreme value at some point then its derivative must be zero there.

In several variables, if the function has an extreme value at some point then all partial derivatives must be zero there, in other words, the gradient is the zero vector.

Justpassingby
  • 10,029