Why does setting the derivative of a likelihood function equal to 0 maximize the likelihood function?

Question

I'm learning from a statistics tutorial which defines a likelihood function as

\begin{align} L(1,3,2,2; \theta)=27 \cdot \theta^{8} (1-\theta)^{4} \tag{1} \end{align}

and then the tutorial sets the derivative of (1) to zero to find the value of $\theta$ that maximizes the likelihood function.

I understand where this formula comes from.

\begin{align} \frac{\text d L(1,3,2,2; \theta)}{\text d\theta}= 27 \big[8\theta^{7} (1-\theta)^{4}-4\theta^{8} (1-\theta)^{3} \big] \tag{2} \end{align}

I don't understand how to determine if setting (2) to 0 produces a maximum or minimum.

Per another tutorial, we could use the second derivative of the function to determine if it is a maximum or minimum.

Here is the second derivative of the likelihood function (1)

$4\left(\theta-1\right)^2\theta^6\left(33\theta^2-44\theta+14\right) \tag{3}$

Setting (2) to zero and simplifying it gives

$2-3\theta = 0 \tag{4}$

How do I use (3) to determine if it is a maximum or minimum?

Should I set (3) to zero and simplify it the same way to get (4)?

Any other method is also welcomed.

This is a question for a maths forum: cancelling the derivative provides an extremum of the function (as the slope changes signs), which can be shown to be a maximum if the second derivative is negative (as the slope goes from increasing to decreasing). — Xi'an ні війні, Sep 12 '21 at 09:07
Your second question "How do I use (3) to determine if it is a maximum or minimum?" is answered in the tutorial you have quoted: "we do the test [the second derivative test] at the point where the slope is zero". Hence, you should substitute the $\theta$ that satisfies (2)/(4) into (3). — , Sep 12 '21 at 09:15
@B.Liu Thank you. So I substitute $\theta=2/3$ into (3), and see if it is less than, equal to, or greater than 0? — soplus2018, Sep 12 '21 at 09:22
@soplus2018 Correct. You should find the result of the second derivative test consistent with the claim that $\theta = 2/3$ maximises the likelihood function (for $\theta$ between 0 and 1). — , Sep 12 '21 at 09:25
"Why does setting the derivative of a likelihood function equal to 0 maximize the likelihood function" -- it doesn't, in general. It finds turning points. Some turning points are local maxima. Some are local minima, Some are neither. Some maxima and minima are not turning points (a point which is often lost on people doing likelihood questions). See books on basic calculus, which should discuss the conditions. — Glen_b, Sep 12 '21 at 14:17

score 6 · Answer 1 · answered Sep 12 '21 at 10:09

To quote from Wikipedia:

One way to state Fermat's theorem is that, if a function has a local extremum at some point and is differentiable there, then the function's derivative at that point must be zero. In precise mathematical language:

Let $$f\colon (a,b) \rightarrow \mathbb{R}$$be a function and suppose that $x_0 \in (a,b)$ is a point where f has a local extremum. If $f$ is differentiable at $x_0$ then$$f'(x_0) = 0$$

and

After establishing the critical points of a function, the second-derivative test uses the value of the second derivative at those points to determine whether such points are a local maximum or a local minimum. If the function $f$ is twice-differentiable at a critical point $x$ (i.e. a point where $f'(x) = 0)$ then:

If $f''(x) < 0$, then $f$ has a local maximum at $x$.

If $f''(x) > 0$, then $f$ has a local minimum at $x$.

If $f''(x) = 0$, the test is inconclusive.

score 4 · Accepted Answer · answered Sep 12 '21 at 11:11

Xi'an is obviously right, but maybe a less formal description can help develop your intuition.

You have one parameter, so you could create a graph with the parameter on the x-axis and the likelihood on the y-axis. The first derivative is the slope of that curve at that parameter value. If the derivative is positive, then the curve is upward sloping so there is a higher likelihood somewhere to the right and the current parameter value cannot be the maximum. Similarly, if the derivative is negative, then the curve is downward sloping so there is a higher likelihood somewhere to the left and the current parameter value cannot be the maximum. Only when the derivative is zero can that parameter value result in a maximum likelihood value. So that is a necessary condition.

However that value could also be a minimum. To distinguish between those we look at the second derivative. This tells us how the slope changes. For a maximum we start with a positive slope, which decreases as we move to right. So the second derivative is negative for a maximum. For a minimum we start with a negative slope, which becomes less negative (i.e. increases) as we move to the right. So the second derivative is positive for a minimum.

Why does setting the derivative of a likelihood function equal to 0 maximize the likelihood function?

2 Answers2