1

I was wondering: are there some necessary criteria to be respected and fulfilled for creating a new statistical mean?

This question came up to my mind while studying arithmeticl mean, gometric mean and harmonic mean. I couldn't notice that, for example, geometric mean cannot accept zero or negative inputs.

So I thought: let's create a new mean (in the same spirit, ish, of someone creating a new distance in mathematics, despite a distance must respect well given properties).

So I thought of those trivial criteria, but I wonder if there is something necessary.

Let's call $\theta$ the new mean, acting on $n$ variables.

  • If $a, b > 0$ then $\theta(a, b) > 0$
  • $\theta(a, b) < a + b$
  • $\theta(a, b) < \max\{ a, b \}$
  • $\theta(a, b) > \min \{a, b \}$

Is there anything else one would expect?

Heidegger
  • 3,229

2 Answers2

2

All "averages" or "centers" are minimizers of some measure of total distance from the rest of the population. Observe that the average must be a value most "similar" to every other element in the population, so it should therefore be the value with the least dissimilarity or error, according to some chosen measure. (And an average must be associated with a measure of error since otherwise there is no sense of closeness.)

As a brief survey, here are some common averages with the error that they minimize, where $\bar{x}$ is an average and $x_i$ are the $n$ many elements of the population.

  • (arithmetic) mean | minimizes the mean squared error (MSE) . . . $\frac1n\sum (\bar{x}-x_i)^2$. (As an aside, the convenient mathematical properties of this measure of error are one of the greatest reasons the mean is so widely used.)

  • median | mean absolute error (MAE) . . . $\frac1n\sum |\bar{x}-x_i|$

  • mode | total number of elements not equal to $\bar{x}$ . . . $\sum \delta(\bar{x},x_i)$ where $\delta$ is the Kronecker delta

  • harmonic mean | squared difference of reciprocals . . . $\frac1n\sum\left(\frac1{\bar{x}}-\frac1{x_i}\right)^2$

  • geometric mean | squared log difference . . . $\frac1n\sum \left(\log \bar{x}-\log x_i\right)^2$

So, to invent a new center, you would equally be showing some reasonable measure of error. A convenient approach would be to assume that individual errors combine additively and are given by a metric $d$. Then your center is the minimizer of $\frac1n\sum d(\bar{x},x_i)$.

Jam
  • 10,325
2

A few reasonable requirements I'd suggest:

  • Commutativity: $\theta(a, b) = \theta(b, a)$.
  • Consistency: Duplicating all values in the dataset doesn't change the mean: $\theta(a, a, b, b) = \theta(a, b)$.
  • Homogeneity: Scaling all numbers in the dataset by a positive constant scales the mean by the same constant: $\theta(ac, bc) = c\theta(a, b)$.
  • Monotonicity: If $0 \le b \le c$, then $\theta(a, b) \le \theta(a, c)$.
Dan
  • 14,978