About the maximum likelihood, when we convert the maximization problem into minimization, why we take the negative?

Question

On page 12, we take $log$ on both side.

$\max_{\boldsymbol{w}}L\boldsymbol({w})=\max_{w}\displaystyle\prod_{n=1}^Np(t^{(i)}|x^{(i)};\boldsymbol{w})$

$\ell(\boldsymbol{w})=-logL(\boldsymbol{w})$

$\ \ \ \ \ \ \ =-\displaystyle\sum_{i=1}^Nlog\ p(t^{(i)}|x^{(i)};\boldsymbol{w})$

The $log$ function is increasing as $\boldsymbol{w}$ increase. Why we have to take the negative?

Jonathan · Accepted Answer · 2020-01-02T17:04:12.083

4

It is common to define optimization problems as minimization problems instead of maximization. And by multiplying your target functions with $-1$ you can transform one into the other:

$$\max_{w} \log{L(w)} \Leftrightarrow \min_{w} -log{L(w)}$$

So to maximize the log-likelihood you minimize the negative log-likelihood. Basically it just comes down to conventions in optimization theory.

Moreover, since $L(w) \in [0,1]$ its logarithm $\log{L(w)}$ will be less than or equal to $0$ (note that $log{0}$ is not defined). Accordingly $\max_{w} \log{L(w)}$ means to maximize a negative number which is, at least to me, less intuitive than minimizing a positive number.

The more interesting part is actually the log-transformation which increases numerical stability of your calculations (since it "transforms" the multiplication to a sum and thereby reduces the risk of underflowing).

edited Jan 02 '20 at 17:04

answered Jan 02 '20 at 16:41

Jonathan

5,410
1
9
21

can you please tell me why log is needed at first hand ? can you please explain this line more with an example maxwlogL(w) means to maximize a negative number ? why we need to maximize when th intention is to minimize the loss? – star Jan 03 '20 at 14:22
1

@Aj_MLstater $\max_{\boldsymbol{w}}L\boldsymbol({w})=\max_{w}\displaystyle\prod_{n=1}^Np(t^{(i)}|x^{(i)};\boldsymbol{w})$ is a product. The problem with this product is that it consists of probabilities which are per definition between $0$ and $1$. And a product of many number between $0$ and $1$ is very small which leads to numerical problems for computers. However, taking the $\log$ of it gives you a sum since $\log{ab} = \log{a} + \log{b}$. And that is easier to handle for computers. – Jonathan Jan 04 '20 at 09:43
thanks for the infromation ,will the result of log of any value will between 0 and 1 ? just like how probablitiy of any value is between 0 and 1 . am i right ? – star Jan 04 '20 at 20:45
@Aj_MLstater Not the logarithm but $p(t^{(i)}|x^{(i)};\boldsymbol{w})$ is between $0$ and $1$. I suggest to read https://en.wikipedia.org/wiki/Logarithm for more information. Looking at the graphs on the right hand side provides a quick understanding of how the $log$ behaves for different bases and input values. – Jonathan Jan 05 '20 at 07:37

About the maximum likelihood, when we convert the maximization problem into minimization, why we take the negative?

1 Answers1