Hessian of Loss function ( Applying Newton's method in Logistic Regression )

Question

If Cost function is L , $$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $$ $$ h(x)=\frac{1}{1+e^{-(w^{T}x+b)}} $$ First order partial deriavative of L with respect to w is , $$\frac{\partial L}{\partial w} = - ( \frac{1}{m} ) ( h(w) - y )x $$
Question :
how do i find the second order partial derivative of L with respect to w ?, that is $$ \frac{\partial ^{2}L}{\partial w^{2}}$$
So that i can compute the error gradient by using Newton's method and update Weights $ w $, like this $$ w_{new} = w_{old} - (\frac{\partial ^{2}L}{\partial w^{2}})^{-1} \ ( \frac{\partial L}{\partial w}) $$
Am just trying to figure out how Newton's method works with logistic regression.

score 2 · Answer 1 · answered Aug 05 '19 at 18:07

2

It is not so clear that you get these concepts.

You should clarify inputs and outputs. What you seem to have done is calculated second derivative of a scalar valued function of one variable. In other words : $$\mathbb R^{1} \to \mathbb R^{1}$$ function. Jacobians take all different partial differentials with respect to all different input variables. For a function $$\cases{x \in \mathbb R^n\\f(x) \in \mathbb R^m}$$

you get an output that is a $n\times m$ matrix.

For a Hessian to be a matrix we would need for a function $f(x)$ to be $$\mathbb R^{n} \to \mathbb R^{1}$$

the more general case

$$\mathbb R^{n} \to \mathbb R^{m}$$

it will be a 3 indexed tensor.

answered Aug 05 '19 at 18:07

mathreadler

25,824

Since I am a beginner i got an idea about the gradients , tensor which is a n-dimensional vector representation. As far second order optimization i am just starting off so Please enlighten me ! – guru_007 Aug 05 '19 at 18:21
I think your derivation of second order derivative is correct, I just wanted to point out that we usually use multivariable functions when talkning of Jacobians and Hessians. – mathreadler Aug 05 '19 at 18:32
i just dont how these things add up to minimizing the loss function as this$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $, where $h(x)=\frac{1}{1+e^-{wx+b}}$ , m = len of vector x and how did L became $\frac{dL}{dw} = - ( \frac{1}{m} ) ( h(x) - y )x $ ? – guru_007 Aug 06 '19 at 10:10
Keep reading. Probably something you missed. – mathreadler Aug 06 '19 at 12:31
you saying that the function has to be positive semi-definite for newton's method to work ? can you brief me on that please – guru_007 Aug 07 '19 at 04:51
@guru_007 : Sorry it's a bit too many question at once and you also change them over time. This kind of discourages people from answering. Newton method needs some properties to converge. Maybe you can make a new question about that specifically? – mathreadler Aug 10 '19 at 16:16
I have figured the answer myself can you verify whether its correct ? @mathreadler – guru_007 Aug 13 '19 at 04:20

Claude Leibovici · Answer 2 · 2019-08-06T10:46:52.110

1

For the second derivative, you could do is faster $$\sigma' (x)= \sigma(x)(1-\sigma(x))=\sigma(x)-\sigma^2(x)$$ $$\sigma'' (x)=\sigma' (x)-2 \sigma (x)\sigma' (x)=\sigma' (x)(1-2\sigma (x))=\sigma(x)(1-\sigma(x))(1-2\sigma (x))$$ which is not what you obtain.

Edit

In order to check the result, let us use the second-order central derivative $$f''(x) = \frac{f(x+h) - 2 f(x) + f(x-h)}{h^{2}}$$ at $x=\frac 12$ and $h=\frac 1 {200}$.

This would give $-0.0575566$ while the formula I wrote gives $-0.0575568$; your formula leads to $0.292561$.

At least, we do not agree.

edited Aug 06 '19 at 10:46

answered Aug 06 '19 at 05:00

Claude Leibovici

260,315

But mine is so simpler than yours ! – guru_007 Aug 06 '19 at 09:56
@guru_007. Simpler, not sure ! Is your correct ? – Claude Leibovici Aug 06 '19 at 10:05
Leiboivici am preety sure about it but am having trouble how does it add to deriving $\frac{d^{2}L}{dx^{2}} $, where $ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $, where $ h(x)=\frac{1}{1+e^-{wx+b}} $ and $\frac{dL}{dw} = - ( \frac{1}{m} ) ( h(w) - y )x $ – guru_007 Aug 06 '19 at 10:15
@guru_007. Have a look at my edit. – Claude Leibovici Aug 06 '19 at 10:47
Thank you , @ClaudeLeibovici that's neat ! still i dont know how it helps me ! my edit is clear now – guru_007 Aug 06 '19 at 11:06
@guru_007. It is not good to change the problem – Claude Leibovici Aug 06 '19 at 11:08
Sorry , i wasn't clear ! but your answer was right to my previous question lets agree that – guru_007 Aug 06 '19 at 11:10
1

Changing problems after answer has been gotten is not nice. It will give bad responses and probably fewer answers in the future. – mathreadler Aug 07 '19 at 08:30
Sorry i shouldn't have changed the question after the answer has been given already. I have figured the answer what's your opinion about it .! @ClaudeLeibovici is it right ? – guru_007 Aug 13 '19 at 04:49

guru_007 · Accepted Answer · 2019-08-12T18:46:47.323

First of all $f(x)$ has to satisfy the condition where its hessian has to be $$\mathbb R^{n} \to \mathbb R^{1}$$ Meaning that $f(x)$ has to be twice differentiable and it is positive semi-definite.

we already know from here that ,

$$ \frac{\partial L} { \partial w} = (h_\theta(x) - y)x $$ $$ \sigma(x) = \frac{1}{1+e^{-(w^Tx+b)}}$$ Refer here for proof on first deriavative of $ \sigma(x)$ , $$ \sigma^{'}(x) = \sigma(x)(1-\sigma(x)) $$

Please note that here $ h_\theta(x) $ and $ \sigma(x) $ are one and the same , i just used $ \sigma(x)$ for representation sake.

Now we need to find $ \frac{\partial^2 L}{ \partial w^2} $ , $$ \begin{align*} \frac{\partial^2 L}{ \partial w^2} &= \frac{\partial L}{\partial w}(xh_\theta(x) - xy) \\ \\ &= x^2 \frac{\partial L}{\partial w} (h_\theta(x)) \ \ \ \ \ \ \ \ \ [ \ h_\theta^{'}(x) = \sigma^{'}(x) \ ] \\ \\ &= x^2 ( \sigma(x) - \sigma(x)^2) \\ \\ & (or) \\ \\ &= x .( \sigma(x) - \sigma(x)^2).x^{T} \end{align*} $$ Am hoping that am correct !

How could you have answered the question if you haven't even formulated one? — mathreadler, Aug 13 '19 at 05:40
I have ignored the unwanted parts & framed the question clearly now ! @mathreadler Is my implementation of answer is correct if not can you provide me an answer ? Thank you — guru_007, Aug 17 '19 at 07:42

Hessian of Loss function ( Applying Newton's method in Logistic Regression )

3 Answers3