0

If Cost function is L , $$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $$ $$ h(x)=\frac{1}{1+e^{-(w^{T}x+b)}} $$ First order partial deriavative of L with respect to w is , $$\frac{\partial L}{\partial w} = - ( \frac{1}{m} ) ( h(w) - y )x $$
Question :
how do i find the second order partial derivative of L with respect to w ?, that is $$ \frac{\partial ^{2}L}{\partial w^{2}}$$
So that i can compute the error gradient by using Newton's method and update Weights $ w $, like this $$ w_{new} = w_{old} - (\frac{\partial ^{2}L}{\partial w^{2}})^{-1} \ ( \frac{\partial L}{\partial w}) $$
Am just trying to figure out how Newton's method works with logistic regression.

guru_007
  • 111

3 Answers3

2

It is not so clear that you get these concepts.

You should clarify inputs and outputs. What you seem to have done is calculated second derivative of a scalar valued function of one variable. In other words : $$\mathbb R^{1} \to \mathbb R^{1}$$ function. Jacobians take all different partial differentials with respect to all different input variables. For a function $$\cases{x \in \mathbb R^n\\f(x) \in \mathbb R^m}$$

you get an output that is a $n\times m$ matrix.

For a Hessian to be a matrix we would need for a function $f(x)$ to be $$\mathbb R^{n} \to \mathbb R^{1}$$

the more general case

$$\mathbb R^{n} \to \mathbb R^{m}$$

it will be a 3 indexed tensor.

mathreadler
  • 25,824
  • Since I am a beginner i got an idea about the gradients , tensor which is a n-dimensional vector representation. As far second order optimization i am just starting off so Please enlighten me ! – guru_007 Aug 05 '19 at 18:21
  • I think your derivation of second order derivative is correct, I just wanted to point out that we usually use multivariable functions when talkning of Jacobians and Hessians. – mathreadler Aug 05 '19 at 18:32
  • i just dont how these things add up to minimizing the loss function as this$ L=−(\frac{1}{m})(y(log(h(x))+(1−y)( log(1−h(x) ) ) $, where $h(x)=\frac{1}{1+e^-{wx+b}}$ , m = len of vector x and how did L became $\frac{dL}{dw} = - ( \frac{1}{m} ) ( h(x) - y )x $ ? – guru_007 Aug 06 '19 at 10:10
  • Keep reading. Probably something you missed. – mathreadler Aug 06 '19 at 12:31
  • you saying that the function has to be positive semi-definite for newton's method to work ? can you brief me on that please – guru_007 Aug 07 '19 at 04:51
  • @guru_007 : Sorry it's a bit too many question at once and you also change them over time. This kind of discourages people from answering. Newton method needs some properties to converge. Maybe you can make a new question about that specifically? – mathreadler Aug 10 '19 at 16:16
  • I have figured the answer myself can you verify whether its correct ? @mathreadler – guru_007 Aug 13 '19 at 04:20
1

For the second derivative, you could do is faster $$\sigma' (x)= \sigma(x)(1-\sigma(x))=\sigma(x)-\sigma^2(x)$$ $$\sigma'' (x)=\sigma' (x)-2 \sigma (x)\sigma' (x)=\sigma' (x)(1-2\sigma (x))=\sigma(x)(1-\sigma(x))(1-2\sigma (x))$$ which is not what you obtain.

Edit

In order to check the result, let us use the second-order central derivative $$f''(x) = \frac{f(x+h) - 2 f(x) + f(x-h)}{h^{2}}$$ at $x=\frac 12$ and $h=\frac 1 {200}$.

This would give $-0.0575566$ while the formula I wrote gives $-0.0575568$; your formula leads to $0.292561$.

At least, we do not agree.

0

First of all $f(x)$ has to satisfy the condition where its hessian has to be $$\mathbb R^{n} \to \mathbb R^{1}$$ Meaning that $f(x)$ has to be twice differentiable and it is positive semi-definite.

we already know from here that ,

$$ \frac{\partial L} { \partial w} = (h_\theta(x) - y)x $$ $$ \sigma(x) = \frac{1}{1+e^{-(w^Tx+b)}}$$ Refer here for proof on first deriavative of $ \sigma(x)$ , $$ \sigma^{'}(x) = \sigma(x)(1-\sigma(x)) $$

Please note that here $ h_\theta(x) $ and $ \sigma(x) $ are one and the same , i just used $ \sigma(x)$ for representation sake.

Now we need to find $ \frac{\partial^2 L}{ \partial w^2} $ , $$ \begin{align*} \frac{\partial^2 L}{ \partial w^2} &= \frac{\partial L}{\partial w}(xh_\theta(x) - xy) \\ \\ &= x^2 \frac{\partial L}{\partial w} (h_\theta(x)) \ \ \ \ \ \ \ \ \ [ \ h_\theta^{'}(x) = \sigma^{'}(x) \ ] \\ \\ &= x^2 ( \sigma(x) - \sigma(x)^2) \\ \\ & (or) \\ \\ &= x .( \sigma(x) - \sigma(x)^2).x^{T} \end{align*} $$ Am hoping that am correct !

guru_007
  • 111