0

Edit: Possible error in the book? See bottom

I'm reading "Functional Data Analysis" by Ramsay & Silverman. The text contains the following (p. 87), regarding how smoothing spline coefficients are calculated: Original Statement. I can't seem to follow the logic. What I have so far: $$\frac{d}{dc} PENSSE = \frac{d}{dc} [(y-\Phi c)'W(y-\Phi c)+\lambda c′Rc]$$ Definitions are given elsewhere in the book as:
$y,c$ are column vectors size k
$\Phi$ is a an matrix size [n,k]
$W$ is, to my understanding a matrix size [k,k] by definition, and is said elsewhere to be symmetric positive definite
$\lambda$ is a scalar, and finally
$R$ is a matrix size [k,k] and is said elsewhere to be positive semidefinite.

To my understanding: $$\frac{d}{dc} PENSSE = \frac{d}{dc} [(y-\Phi c)'W(y-\Phi c)+\lambda c'Rc]$$ $$\frac{d}{dc} PENSSE = \frac{d}{dc}[(y'W-c'\Phi'W)(y-\Phi c) \lambda c'Rc]$$ $$\frac{d}{dc} PENSSE = \frac{d}{dc}[y'Wy-c'\Phi'Wy- y'W\Phi c+c'\Phi'W\Phi c+λc′Rc]$$ Remove expression without $c$: $$\frac{d}{dc} PENSSE = \frac{d}{dc}[-c'\Phi'Wy- y'W\Phi c+c'\Phi'W\Phi c + \lambda c'Rc]$$ Inverting the inner product that is the second term on the right does not change its result, but helps collect terms: $$\frac{d}{dc} PENSSE = \frac{d}{dc}[-2c'\Phi'Wy+c'\Phi'W\Phi c +\lambda c'Rc]$$ $$\frac{d}{dc} PENSSE = \frac{d}{dc}[-2c'\Phi'Wy]+\frac{d}{dc}[c'\Phi'W\Phi c]+\frac{d}{dc}[\lambda c'Rc]$$ Which should, according to the text, result in: $$-2\Phi'Wy+\Phi'W\Phi c +\lambda Rc$$

So my question, finally is as follows: Why do the second and third term work out the way they do? The first term is easy - derivating by a vector means removing the vector. When derivating an expression with multiple occurrences of the vector, shouldn't there be a '2' in the derivative?

Edit: from proposition 9 in this Link, it seems my intuition was right, and all terms should eventually have a coefficient of 2, which can be divided out. This is in line with the book's resultant statement: $$c = (\Phi'W\Phi+\lambda R)^{-1}\Phi'Wy$$

1 Answers1

2

I will introduce for ease of notation, the Frobenius inner product as:

$$ A:B = \operatorname{tr}(A^TB)$$

with the following properties derivied from the underlying trace function

$$\eqalign{A:BC &= B^TA:C\cr &= AC^T:B\cr &= A^T:(BC)^T\cr &= BC:A \cr } $$

Let's use the step you already know that:

$$\eqalign{ f &= x:Ax \cr df &= (A+A^T) : dx}$$

Thus finding the differential and gradient is strightforward.

With: $$\eqalign{u&=(y-\Phi c) \\ du &= -\Phi dc }$$

your function becomes:

$$\eqalign{ f &= u : Wu + \lambda c : Rc \cr df &= (W+W^T)u : du + \lambda (R+R^T) c : dc \cr &= (W+W^T)(y-\Phi c) : -\Phi dc + \lambda (R+R^T) c: dc \cr &= -2W(y-\Phi c) : \Phi dc + \lambda (R+R^T) c: dc \cr &= -2\Phi^T W (y- \Phi c) : dc + \lambda(R+R^T) c : dc \cr &= (-2\Phi^T Wy + 2\Phi^TW \Phi c + \lambda(R+R^T)c) : dc }$$

Thus the gradient can be identified as:

$$\frac{\partial f}{\partial c} = -2\Phi^T Wy + 2\Phi^TW \Phi c + \lambda(R+R^T)c$$

Equating the gradient to zero and solving for c gives:

$$c = (2\Phi^TW\Phi + \lambda(R+R^T))^{-1} (2 \Phi^TWy) $$

In case R is also symmetric, then we can cancel the factor 2 and you get the proposed expression in your edit:

$$c = (\Phi^TW\Phi+\lambda R)^{-1}\Phi^TWy$$

Traws
  • 541
  • Interesting! I'm not familiar with the concept of Frobenius inner product. It seems very relevant, but I'll have to spend a little bit more time taking it in. Is this the standard way to approach such derivatives, or are there other ways? Either way - thank you very much for your reply, and I'm happy to see I was right and the error is in the book... – BestBoyCoop Apr 21 '19 at 15:21
  • The final result is correct, but you have a sign error on the $\Phi^TW\Phi c$ term in the intermediate steps. – greg Apr 21 '19 at 16:10
  • Oops. Thanks @greg, corrected. – Traws Apr 21 '19 at 16:15