2

I just enrolled in AP Statistics course this week. And, one thing that popped up a lot in descriptive statistics is the idea of the mean. I understand mean as the closest data point to all of the data points. But, I simply don't understand the rationale behind it. How the one who made this formula come to think about it? What's intuition? Can anyone resolve that for me?

Why mean = Sum of observations / Number of observations?

  • 1
    You may instead pick any linear functional that is symmetric in its input and maps constant input to the common value ... – Hagen von Eitzen Apr 02 '22 at 14:57
  • 2
    @Hagen von Eitzen - do you really expect the person asking this question to have any idea what “linear functional” means, or that that comment could be helpful? – Adam Rubinson Apr 02 '22 at 15:20
  • @Tortar I don’t think it provides me with valid reasons that why the formula makes sense. – Vinay Sharma Apr 02 '22 at 16:10
  • 1
    First, "mean" is just the usual average of a list of numbers (the numbers don't have to all be distinct), such as the average of your test scores. Probably the most intuitively basic way of characterizing the mean is that it's the only number with the property that the sum of its signed deviations from the numbers in the list is 0. For example, consider $-4,,2,,11.$ The mean of these numbers is $3,$ and the signed deviations of these numbers from $3$ are: $-7$ (because $-4$ is $7$ to the left of $3)$ and $-1$ (because $2$ is $1$ to the left of $3)$ (continued) – Dave L. Renfro Apr 03 '22 at 08:52
  • 1
    and $8$ (because $11$ is $8$ to the right of $3).$ This is essentially what JMP's answer shows. You might find it helpful to try this with other specific examples. Then maybe see if you can show this in general for three different numbers $a,,b,,c$ such that $a < b < c$ (and afterwards, see if your algebra calculation still works if two or all three of these numbers are equal). To prove this property in general requires using some finitely many arbitrarily specified numbers (e.g. $x_1,,x_2,,\ldots,,x_n)$ and show two things: (1) there exists a number $y$ (continued) – Dave L. Renfro Apr 03 '22 at 08:53
  • 1
    (the number $y$ might be equal to one of the numbers in your list of numbers, or it might not be equal to any of them) such that the signed deviations of $y$ from each of the originally specified numbers add to $0,$ and (2) only one such number $y$ has this property. This is not something you should be especially concerned about being able to show (at least not at this time), but it's essentially what JMP's answer does. – Dave L. Renfro Apr 03 '22 at 08:54
  • 1
    After posting my comments (which I took a long time to write, as I was working on other stuff while doing so), I noticed this question has been closed and another question cited. At that other question, this answer by mweiss (the "Edited to add" part) is the same thing I was describing. – Dave L. Renfro Apr 03 '22 at 08:58

1 Answers1

2

Imagine that the $n$ data points are on a see-saw. It is possible to place a pivot to balance the see-saw, call this position $y$.

There are $k$ data points to the left, these contribute

$$\sum_{i=1}^k (y-x_i) = ky - \sum_{i=1}^k x_i$$

On the right, the $n-k$ data points contribute

$$\sum_{i=k+1}^n (x_i - y) = \sum_{i=k+1}^n x_i - (n-k)y$$

As the smallest the difference can be is zero, this is achieved when

$$ky - \sum_{i=1}^k x_i = \sum_{i=k+1}^n x_i - (n-k)y$$$

$$ky + (n-k)y = \sum_{i=1}^k x_i + \sum_{i=k+1}^n x_i$$

$$ny = \sum_{i=1}^n x_i$$

$$y = \frac1n\sum_{i=1}^n x_i$$

JMP
  • 21,771
  • I think it helped but it’s hard for me understand this sigma language. Can you write the mathematical part in a way that a novice like me can understand? – Vinay Sharma Apr 04 '22 at 10:01
  • There's a big page on it here: https://en.wikipedia.org/wiki/Summation. $\sum_{i=1}^3 x_i$ means $x_1+x_2+x_3$. – JMP Apr 04 '22 at 10:03
  • can you explain what (ky) in the first equation mean? – Vinay Sharma Apr 04 '22 at 10:27
  • $\sum_{i=1}^k (y-x_i) = \sum_{i=1}^k y - \sum_{i=1}^k x_i = ky - \sum_{i=1}^k x_i$. The sum of a constant $k$ times is $kc$. – JMP Apr 04 '22 at 10:42