3

A sufficient statistic for a parameter is a statistic that captures all the information about a given parameter contained in the sample.

My question: Is the above sentence correct. (I think it is). If yes then what is the purpose of a sufficient statistic? I mean it does not give any additional information about the unknown parameter (to be estimated) that is not already present in the sample at the first place. So what is the use of sufficiency in Mathematical Statistics?

EDIT 1:

After @user164740 's response:

My queries:

1) So it means that a sufficient statistic can have less information about the parameter to be estimated than present in the given sample?

2)And how would a worse statistic (in terms of information contained about the parameter) would help if the given statistic is not helpful? I mean how is the given sufficient statistic helpful and how would a worse statistic be helpful in estimating a parameter?

Richard J
  • 183
  • Did you know that there is a statistics site in the stackexchange network? It's called CrossValidated. There should be a link to it on this page. – Gerry Myerson Mar 12 '15 at 11:30

2 Answers2

9

Your definition of sufficiency is correct.

Sufficiency pertains to data reduction, not merely estimation. A sufficient statistic need not estimate anything. For example, if $X_1, \ldots, X_n$ are iid samples drawn from an exponential distribution with unknown mean $\theta$, then $\bar X$ is sufficient for $\theta$, but so is $(X_1 + \cdots + X_{n-1}, X_n)$. The former achieves greater data reduction--the latter achieves less reduction, since it consists of two numbers. The former is itself an estimator of $\theta$; the latter does not estimate $\theta$ directly; you need to transform it somehow: you could, for example, decide to make the estimator $X_n$ from this sufficient statistic, but this estimator is not sufficient nor is it a particularly "good" estimator.

The purpose of sufficiency is to demonstrate that statistics that satisfy this property do not discard information about the parameter, and as such, estimators that might be based on a sufficient statistic are in a sense "good" ones to choose.

In regard to your second question, let's go back to the exponential example. A non-sufficient statistic that was mentioned was $X_n$. This statistic simply discards all the previous observations and keeps only the last. And yes, it does estimate $\theta$: note $\operatorname{E}[X_n] = \theta$ by definition, and so it is even an unbiased estimator. But does it perform very well? No; its asymptotic variance is constant and independent of the sample size, meaning that no matter how large a sample size you choose, this estimator never gets any closer to estimating the true value of $\theta$ on average--and of course, this makes intuitive sense. You've discarded all the previous observations.

A better estimator would be to take the mean of all the odd-numbered observations; e.g., $(X_1 + X_3 + \cdots + X_{2n-1})/(2n-1)$, and yes, this too is an unbiased estimator of $\theta$. Still, you can see why it's not as good as the mean of all the observations. It does achieve data reduction, but since it is not a sufficient statistic, it "wastes" too much. That's what being able to show sufficiency gets you; if an estimator is sufficient, it isn't "wasteful."

heropup
  • 135,869
2

Our problem is to estimate a parameter from some sample data. In order to do that, we'll find a statistic of that sample. A sufficient statistic is a statistic such that no other statistic of that same sample provides more information about the value of the parameter that we're trying to estimate. In other words, we can't find a better statistic (even if our sufficient statistic doesn't contain all the information about the parameter), but we can choose a worse statistic to use for our estimation.

  • Thanx. for better clarity I have put my further query (after reading your answer) in my original question itself as EDIT 1. Please respond to that. – Richard J Mar 12 '15 at 11:59
  • A sufficient statistic has all the information you're going to get from a statistic of a sample, regarding a particular parameter, in the sense that you can't find a statistic of the sample with more information. What I'm trying to not say is "A sufficient statistic has enough information to estimate the parameter". So we have full information, given the sample and the statistic, but "full" may not be enough.
  • – user164740 Mar 12 '15 at 14:22
  • It wouldn't be helpful, but that doesn't mean we couldn't do it.
  • – user164740 Mar 12 '15 at 14:24