18

I am a bit confused about the derivation of MLE of Uniform$(0,\theta)$.

I understand that $L(\theta)={\theta}^{-n}$ is a decreasing function and to find the MLE we want to maximize the likelihood function.

What is confusing me is that if a function is decreasing, then wouldn't the function be maximized at the smallest input rather than the largest?

Thank you in advance for your help.

StubbornAtom
  • 17,052
hyg17
  • 5,117
  • 4
  • 38
  • 78
  • Your formula for $L(\theta)$ is incorrect. What happens when you have data points with values larger than $\theta$? – Brian Borchers Oct 03 '18 at 19:29
  • I did. No matter where I look at they say that the function is decreasing therefore the maximum X maximizes L(theta), as if it is self evident. I know that there must be a misunderstanding on my part, but I cannot figure out. – hyg17 Oct 03 '18 at 19:31
  • Brian: I am assuming that you are referring to the fact that all X_is are between 0 and theta. – hyg17 Oct 03 '18 at 19:32
  • A more detailed explanation of what @BrianBorchers comments above is given in this (highly upvoted) Answer to a previous Question. – hardmath Oct 03 '18 at 21:40
  • Thank you for providing the link but I already saw that. I don't understand the part that says, "decreasing function maximizes at the largest X." $1/x$ for example, maximizes as x is smaller as long as x>0 – hyg17 Oct 03 '18 at 21:51
  • 1
    @hardmath Highly upvoted but highly incorrect in my mind. – StubbornAtom Oct 04 '18 at 18:47
  • @hardmath The objection with that post is that the likelihood function is not differentiable at $\max x_i$. As such, differentiating the likelihood to find the MLE is not justified. – StubbornAtom Oct 04 '18 at 20:01
  • @StubbornAtom: Okay, maybe the maximization of likelihood function is worth my two cents, assuming this Question stays open (it is hovering on threshold to close now), but I can post on the proposed duplicate and would welcome critiques there. – hardmath Oct 04 '18 at 20:25
  • See https://math.stackexchange.com/questions/649678/how-do-you-differentiate-the-likelihood-function-for-the-uniform-distribution-in?rq=1. – StubbornAtom Oct 05 '18 at 18:29

1 Answers1

50

Welcome back to MSE.

This is one of those things that once you're explained it correctly the first time, without any gaps in explanation, that it makes sense. Unfortunately, most answers and even professors don't explain all of the details, in my experience.

Suppose $X_1, \dots, X_n$ are independent and distributed $\text{Uniform}(0, \theta)$, with $\theta > 0$.

Let $\mathbf{I}$ denote the indicator function, where $$\mathbf{I}(\cdot) = \begin{cases} 1, & \cdot \text{ is true} \\ 0, & \cdot \text{ is false.} \end{cases}$$

The probability density function of any of the $X_i$, for $i \in \{1, \dots, n\}$, can be written like so: $$f_{X_i}(x_i \mid \theta) = \dfrac{1}{\theta}\cdot\mathbf{I}(0<x_i<\theta)\text{.}$$ The likelihood function is thus given by $$\begin{align} L(\theta)&=f_{X_1, \dots, X_n}(x_1, \dots, x_n \mid \theta)\\ &=\prod_{i=1}^{n}f_{X_i}(x_i \mid \theta) \\ &= \dfrac{1}{\theta^n}\prod_{i=1}^{n}\mathbf{I}(0 < x_i < \theta)\text{.} \end{align}$$

The following claim, although used, is often omitted from explanations:

Claim. Let $A$ and $B$ be events. Then $\mathbf{I}(A)\cdot \mathbf{I}(B)=\mathbf{I}(A \cap B)$.

I leave the proof of this to you. Note that $ 0 < x_i < \theta$ is the same as requiring both $x_i > 0$ and $x_i < \theta$. Hence, we write $$\begin{align} L(\theta)&=\dfrac{1}{\theta^n}\prod_{i=1}^{n}\mathbf{I}(0 < x_i < \theta) \\ &= \dfrac{1}{\theta^n}\prod_{i=1}^{n}[\mathbf{I}(x_i > 0)\mathbf{I}(x_i < \theta)] \\ &= \dfrac{1}{\theta^n}\prod_{i=1}^{n}[\mathbf{I}(x_i > 0)]\prod_{j=1}^{n}[\mathbf{I}(x_j < \theta)]\text{.} \end{align}$$

It will be clear why I split the product as above in a bit.

The claim given above is true if we were to extend to an arbitrary number of events as well. Thus,

$$\prod_{i=1}^{n}[\mathbf{I}(x_i > 0)] = \mathbf{I}(x_1 > 0 \cap x_2 > 0 \cap \cdots \cap x_n > 0)$$ and $$\prod_{j=1}^{n}[\mathbf{I}(x_j < \theta)] = \mathbf{I}(x_1 < \theta \cap x_2 < \theta \cap \cdots \cap x_n < \theta)\text{.}$$

The next claims are often omitted as well from explanations:

Claim 1. Given $x_1, \dots, x_n \in \mathbb{R}$, $x_1, \dots, x_n < k$ if and only if $$x_{(n)}:=\max_{1 \leq i \leq n}x_i < k\text{.}$$

Claim 2. Given $x_1, \dots, x_n \in \mathbb{R}$, $x_1, \dots, x_n > k$ if and only if $$x_{(1)}:=\min_{1 \leq i \leq n}x_i > k\text{.}$$

Thus $$\prod_{i=1}^{n}[\mathbf{I}(x_i > 0)] = \mathbf{I}(x_1 > 0 \cap x_2 > 0 \cap \cdots \cap x_n > 0) = \mathbf{I}(x_{(1)} > 0)$$ and $$\prod_{j=1}^{n}[\mathbf{I}(x_j < \theta)] = \mathbf{I}(x_1 < \theta \cap x_2 < \theta \cap \cdots \cap x_n < \theta) = \mathbf{I}(x_{(n)} < \theta)\text{.}$$ The likelihood function is thus $$L(\theta) = \dfrac{1}{\theta^n}\mathbf{I}(x_{(1)} > 0)\mathbf{I}(x_{(n)} < \theta)\text{.}\tag{*}$$ Now, consider the above as a function of $\theta$. For all intents and purposes, $\mathbf{I}(x_{(1)} > 0)$ is irrelevant when it comes to maximization of $L$ with respect to $\theta$, because it is independent of $\theta$. So, the part that really matters is $$L(\theta) \propto \dfrac{1}{\theta^n}\mathbf{I}(x_{(n)} < \theta) = \dfrac{1}{\theta^n}\mathbf{I}(\theta > x_{(n)})\text{.}\tag{**}$$ Generally, when doing maximum-likelihood estimation, we assume that the observed $x_i$ fall within the support of the given distribution, so we'll just assume $x_{(1)} > 0$.

Remember to view (**) as a function of $\theta$. If $\theta \leq x_{(n)}$, note that $L(\theta) = 0$ because of the indicator function. This is not the maximized value of $L$; $L$ is, at its crux, a probability density function: $0$ is in fact the smallest value that a probability density function can take.

So, in attempting to maximize $L$, suppose that $\theta > x_{(n)}$. For $n$ fixed, we obtain $$L(\theta) \propto\dfrac{1}{\theta^n}\text{.}$$ Now, note that $\dfrac{1}{\theta^n}$ is indeed a decreasing function of $\theta$ with $n$ fixed. Thus, we must make $\theta$ as small as possible, given our restriction of $\theta > x_{(n)}$.

Note. Technically, no such $\theta$ exists (because $\theta$ is strictly greater than $x_{(n)}$ per our assumptions). This is often ignored in many textbooks.

Most textbooks will then say that the maximum likelihood estimator of $\theta$ is $$\hat{\theta}_{\text{MLE}} = X_{(n)}\text{.}$$

Note. Technically, the above result is false. The MLE does not exist, because $\theta$ cannot take on the value $x_{(n)}$ itself. For this answer to be correct, the support of the uniform PDF must include $\theta$ itself (because the maximum likelihood estimator equals one of the $X_i$). The reason for this is discussed in the Lecture 2: Maximum Likelihood Estimators from MIT OpenCourseWare 18-443 Statistics for Applications, found here. As the question currently stands, $(0, \theta)$ should be $(0, \theta]$.

Clarinetist
  • 19,519