4

This could be to a certain extent a philosophical question and I might know the answer already but I am interested in what others think.

My question is roughly the following. Let's say I am reading some new math material and let that be Statistics for my example but it could apply in any branch of math.

I come across population or sample mean and the definition is something like: $$ \mu = \frac{1}{n}\sum_{i=1}^n x_i $$

For some reason this is intuitive enough, the average is the sum of the values divided by their number.

Now you get to the equation for variance: $$ S^2 = \frac{1}{n} \displaystyle\sum_{i} (x_i - \bar{x})^2 $$

Now according to my understanding, the variance is a measure of data spread from the mean. So translates to subtraction in the equation. All good, then comes the squaring ^2.

Looking around online, many theories come about why there is a squaring operation in there. To get rid of negative values, to allow for analysis of a continuous function and so on and so forth.

I find my self in this situation very often, where I see an equation and I don't understand how it came about. It seems as if the inventor or author just had an intuition that he didn't document or maybe it's trivial and its up to the reader to deduce the knowledge.

What is it though? Why isn't such information captured? Is it just about practicing? Or we don't even care, we just study useful properties or if things make sense? How do mathematicians or scientists arrive at such conclusions? hunch? trial and error?

  • 1
    Good and thorough question. If I have some time, I will try to answer it, very humbly. – Jean Marie May 02 '20 at 15:26
  • 2
    You could begin by reading this – Jean Marie May 02 '20 at 15:29
  • 2
    You could also have a look to this. – Jean Marie May 02 '20 at 15:30
  • 1
    Basically, in order to avoid that two points "very far" from the mean, one in positive and one in negative, will cancel each other conveying the wrong information that tehre is no dispersion at all. – Mauro ALLEGRANZA May 02 '20 at 15:33
  • 1
    @Mauro ALLEGRANZA Sorry, I don't want to polemicate, but this is a perfect example of (AJ) "A Posteriori Justification" : in this case why not take the absolute values instead of the squares (it has been attempted, but doesn't give good results... because you use a non-differentiable indicator, but I realize that my "because" is also a kind of AJ...) ; why not take the power 4 (which has a signification as "fourth moment") ? Answer : because it is to complicated/or it exaggerates the distances, etc. but this argumentation is still in the category "AJ". Is there a "true" justification? – Jean Marie May 02 '20 at 17:19
  • 2
    @JeanMarie - Do not worry :-) We are here (also) to exchange points of view. My point is that, IMO, there is nothing "philosophical" regarding variance. – Mauro ALLEGRANZA May 02 '20 at 17:26
  • 1
    @Mauro ALLEGRANZA I mostly agree with you. One thing is certain, as the importance of variance (mainly through its square root $\sigma$) has only been recognized in the 1900s, it is a proof that this parameter wasn't that intuitive... – Jean Marie May 02 '20 at 17:36
  • Thanks for the great links @JeanMarie. While I might be going down the rabbit hole, I am doing it intentionally here. My problem probably lies in not trying exercises and problem solving enough ( I was hoping I can go faster by doing that as I am re-learning math in my free time which I don't have much of). – Ibrahim Najjar May 03 '20 at 00:01
  • I just want to stress again. While my example was about variance, the situation applies in many other branches of math; It just happens that I am reading a lot of statistics these days and this is the first example that came to mind. – Ibrahim Najjar May 03 '20 at 00:02

1 Answers1

3

You are asking at least two related, but very different, questions: How do we arrive at these formulas? and, Why don't we tell people how we arrive at these formulas?

When Ramanujan was asked how he came to his formulas, he said a god would give them to him. So I think that at least sometimes we really have no idea how we come to our formulas, and that would explain why we don't tell people how we arrive at these formulas; we don't, because we can't, since we don't know ourselves.

If we are writing a formula we didn't find ourselves, but got from someone else, we may have the same reason for not telling people how the formulas were originally discovered; we simply don't know how the first person to discover them got there, and we're just copying them (after having verified them, of course) from the person who discovered them (or copying from someone who was copying from someone who was copying from someone and so on and so on).

And even if we know how a formula was discovered, there are considerations of space. Textbooks are hundreds of pages long as it is – if they included the origins of all the results they use, you wouldn't be able to lift them. Papers in journals are expensive to print, and anything that makes them longer makes them more expensive, so some journals frown on too much expository work. And some authors figure their readers aren't interested in where the results come from, or maybe that their readers are bright enough to figure it out on their own.

I'm probably leaving out a lot of stuff.

Gerry Myerson
  • 179,216