Why are IQ test results normally distributed?

Question

(I'm a nooby in probability)

So why are IQ test results normally distributed? Or more precisely what are the hypothesizes and theorems that imply this distribution?

Has it to do with the central limit theorem? (But this theorem is about the arithmetic mean of iid variables. I dont see iid variables here: I suppose it's not one person repeating the test. Is it the skills given at a person that is considered as a random variable?)

I'm guessing it's just experimentally verified. It's not a true normal distribution of course. Many things in nature obey a normal distribution (or something close). — Cameron Williams, Jun 03 '15 at 21:17
Regarding to H. J. Eysenck, IQ is an overall productivity of thinking processes -- that is the random variable. — Alexey Burdin, Jun 03 '15 at 21:24
IQ is adjusted by fiat so that the distribution of scores is normal with a mean of 100 and a S.D. of 15. You can always map the scores so that any distribution, whatever it is, becomes normal. — Ron Maimon, Jun 03 '15 at 21:25
@ Ron Maimon If you can adjust the mean and S.D., you can adjust the shape of the distribution too?? — LLuu, Jun 03 '15 at 21:38
This is a very good question. There is absolutely no mathematical reason for IQ test results to be normally distributed. And I never realised it! In practice, it turns out that they are approximated closely by a normal distribution. But this is just luck. — TonyK, Jun 03 '15 at 21:57
The (over a century old) history of IQ tests includes a number of dubious statistical practices, such as "removal" of inconvenient data and "re-designing" the questions to produce the "expected" results. It cannot really be said to measure "human intelligence" as we have since come to understand that term. The "defined" normal distribution is more imposed than observed. But it is so useful for some parties to "sieve" people by abusing the number that it has been difficult to get its application dropped entirely... — colormegone, Jun 04 '15 at 03:07
@LLuuUsErI132UOutOfMemory: You adjust the distribution by defining the original score I(n) as some increasing function of the number n of correct answers, then find the empirical distribution f(I) of the unnormalized I scores in the population, and then you reparametrize I by defining a new score function g so that f(I(g)) dg/dI is a Gaussian with mean 100 and s.d. 15. You can always do it by relabling the x axis, every 1-d distribution can be reparametrized to any other. They also have baskets of gender biased questions, and they also adjust the test to make sure female and male IQ is equal. — Ron Maimon, Jun 09 '15 at 02:29
@TonyK: IQ is normal by definition, you parametrize the raw score by brute force so that the distribution ends up normal. If you use any natural mathematical metric for difficulty of questions, like "size of search space" in a chess problem, or "number of steps of deduction" in a mathematical problem, basically anything, in the natural metric, the distribution of humans would be a power-law like distribution with a heavy tail and different individuals would perform astronomically better at some tasks than others. — Ron Maimon, Jun 09 '15 at 02:34
@RonMaimon, whether human skill test outcome distributions are normal versus power law is something I find very interesting, and your answer has shed some light for me on how the resulting distributions are formed. So for example, if a bunch of students are taking a multiple choice test, a normal distribution can easily form. What I would really like to find is a good paper on this topic, which thoroughly explores the distribution types resulting from different manners of measuring human skill. Do you know of any such papers? (Or anyone else?) (Should I pose a new question?) — Cameron, Jul 30 '16 at 07:50
@Cameron: There is no paper as far as I know, I noticed myself. Tests of expertise with a natural measure, e.g. chess problems, have a distribution of performance which is roughly exponentially distributed in a logarithmic metric, i.e. novices can solve any 1 move checkmate, half can solve two move checkmates, half again 3 move checkmate, and 20 move sequences are for world champions. This is a powerlaw, as the search space grows exponentially in the number of moves. Similarly with mathematical proofs, or Go. IQ shoehorns it into a Gaussian. It's not a Gaussian, because it's not genetic. — Ron Maimon, Aug 01 '16 at 20:44

score 13 · Answer 1 · edited Feb 07 '24 at 05:48

As Ron Maimon has said in the comments, the IQ scale is defined so that it gives a normal distribution with a mean of $100$ and a standard deviation of $15$. This is possible for any test score with a continuous distribution $f$. If the subject's score on the test is $s$, their IQ will be given by:

$$\text{IQ}=100+15\sqrt 2\;\text{erfc}^{-1}\left(2-2\int_{-\infty}^sf(x)dx\right)$$

To see that this gives a normal distribution, invert the equation above:

$$\int_{-\infty}^sf(x)dx = 1 - \frac{1}{2}\;\text{erfc}\left(\frac{\text{IQ} - 100}{15\sqrt 2}\right)$$

The left-hand side is the cumulative distribution function (CDF) of the test scores and the right-hand side is the expression for the CDF of a normal distribution.

It may sound weird to define IQ so that it fits an arbitrary distribution, but that's because IQ is not what most people think it is. It's not a measurement of intelligence, it's just an indication of how someone's intelligence ranks among a group:

The I.Q. is essentially a rank; there are no true "units" of intellectual ability.

[Mussen, Paul Henry (1973). Psychology: An Introduction. Lexington (MA): Heath. p. 363. ISBN 978-0-669-61382-7.]

In the jargon of psychological measurement theory, IQ is an ordinal scale, where we are simply rank-ordering people. (...) It is not even appropriate to claim that the 10-point difference between IQ scores of 110 and 100 is the same as the 10-point difference between IQs of 160 and 150.

[Mackintosh, N. J. (1998). IQ and Human Intelligence. Oxford: Oxford University Press. pp. 30–31. ISBN 978-0-19-852367-3.]

When we come to quantities like IQ or g, as we are presently able to measure them, we shall see later that we have an even lower level of measurement—an ordinal level. This means that the numbers we assign to individuals can only be used to rank them—the number tells us where the individual comes in the rank order and nothing else.

[Bartholomew, David J. (2004). Measuring Intelligence: Facts and Fallacies. Cambridge: Cambridge University Press. p. 50. ISBN 978-0-521-54478-8.]

From those quotes, you can deduce that any other information about the original distribution of scores in the actual test used to measure intelligence, like skewness and kurtosis, is simply lost.

The reason for choosing a Gaussian, and with those parameters, is mostly historical. But it's also very convenient. It turns out that you'd need about as many people as have ever existed to get someone to be ranked as an IQ of zero and someone else to be ranked as $200$ (roughly $1$ in $76$ billion). So in practice, IQ is limited to the interval $[0, 200]$ (no, there's no such thing as $300$ IQ. Sorry Sidis). Also, the dumbest person alive would have an IQ of $5$ and the smartest person $195$ ($1$ in $8.3$ billion). If you could theoretically apply the same test to trillions of people, then you'd get IQs above $200$ and even negative IQs. Obviously, the results will be different for different tests, and you might question whether any of the tests have really anything to do with actual intelligence.

To illustrate how you'd calculate IQ in practice, I made a Python script which takes an arbitrary distribution of scores in a test, generates a sample of $10$ thousand results and uses that to calculate the IQ of an additional participant based on their score in the same test. The plot shows the distribution of scores and how it's transformed into a normal distribution when converted to IQ. The score and IQ of the new participant are shown in red.

import bisect
import numpy as np
from scipy.special import erfcinv
from scipy.stats import rv_continuous
N_SAMPLES = 10000
np.random.seed(0)
Start with any continuous distribution for the test scores.
In this case, it's a multimodal distribution.
def scores_pdf(x):
    return (2-np.cos(2np.pix/5))/20
class dist_gen(rv_continuous):
    def _pdf(self, x):
        return scores_pdf(x)
We restrict the score to be between zero and ten.
dist = dist_gen(a=0, b=10)
scores = dist.rvs(size=N_SAMPLES)
scores = list(sorted(scores))
Convert from percentile to IQ
def p_to_iq(p):
    return 100 + 15np.sqrt(2)erfcinv(2 - 2*p)
The scores are not even used yet, only their ordering.
iqs = [p_to_iq(k/N_SAMPLES) for k in range(1, N_SAMPLES)]
Calculating the percentile for a finite sample set depends on
how it is defined. This function returns the smallest and the
highest values.
def get_percentile_bounds(score):
    size = len(scores)
    # Number of samples lower than 'score'
    lower = bisect.bisect_left(scores, score)
    # Number of samples greater than 'score'
    greater = size - bisect.bisect_right(scores, score)
    return lower/(size+1), (size-greater+1)/(size+1)
We use only the definition of percentile closest to the average.
(This is arbitrary and irrelevant for large sample sets)
def get_iq(score):
    p1, p2 = get_percentile_bounds(score)
    if abs(p1-0.5) < abs(p2-0.5):
        return p_to_iq(p1)
    return p_to_iq(p2)
A new subject performs the test.
new_score = dist.rvs()
new_iq = get_iq(new_score)
print(new_score, new_iq)
import matplotlib.pyplot as plt
from scipy.stats import norm as gaussian
N_BINS = 100
def highlight_patch(value, patches, axis, fmt):
    left, bottom = patches[0].get_xy()
    width = patches[0].get_width()
n = int((value - left)//width)
patches[n].set_fc('r')

x = left + (n+0.5)*width
y = bottom + 0.7*patches[n].get_height()
axis.text(x, y, fmt%(value),
          horizontalalignment='center',
          verticalalignment='center',
          bbox=dict(facecolor='white',alpha=0.9))


fig, axes = plt.subplots(nrows=2, ncols=1)
n, bins, patches = axes[0].hist(
    scores,
    bins=100,
    density=True,
    edgecolor='black')
highlight_patch(new_score, patches, axes[0], '%.2f')
x = np.linspace(scores[0], scores[-1], num=200)
axes[0].plot(x, [scores_pdf(k) for k in x], ':')
axes[0].set_title('Test scores distribution')
n, bins, patches = axes[1].hist(
    iqs,
    bins=100,
    density=True,
    edgecolor='black')
highlight_patch(new_iq, patches, axes[1], '%.0f')
x = np.linspace(iqs[0], iqs[-1], num=200)
axes[1].plot(x, [gaussian.pdf(k, 100, 15) for k in x], ':')
axes[1].set_title('IQ distribution')
fig.tight_layout()
plt.show()

user2566092 · Answer 2 · 2015-06-03T21:25:45.350

5

It has been an empirically observed fact that many "naturally" observed traits, like height or IQ, are NOT empirically normally distributed. At the very least they can't be truly normally distributed because they are always non-negative. But even more than that, before non-negativity is violated, it has been observed that the "tails" (values enough standard deviations away from the mean) tend to have higher probability than predicted by a normal distribution for the population, at least for certain traits. The only thing you can say is that if you take many samples and compute the mean, then the empirical mean for the sample should be approximately normally distributed under mild assumptions if you have enough samples (this is the central limit theorem).

As an aside, if you'd like a speculative theory for why many traits appear "somewhat normal", just consider the possibility that many factors affect the trait, e.g. many genetic factors and many environmental factors. If you have many factors and their effects are additive and you don't have too crazy distributions for each factor's effect, and the factors are independent enough, then the accumulated effect should be somewhat normal basically by the central limit theorem.

edited Jun 03 '15 at 21:25

answered Jun 03 '15 at 21:19

user2566092

26,142

1

True, but if you look at the general population trends, the bell curve approximates a normal distribution. We all know that IQ cannot be negative, but the mean is enough standard deviations away from 0 that the curve is essentially a normal distribution curve. To your point about the tails being too common, is that because sampled traits such as IQ must occur in integral sample counts $x\in \mathbb{N}$, thereby reducing the sample probability resolution to being no better than $\frac{1}{N_{samples}}$? – FundThmCalculus Jun 03 '15 at 21:25
3

@FundThmCalculus No I'm saying that if you fit a normal to a large sample of say height, then when you evaluate the likelihood of the data under that normal you will tend to get significantly lower likelihood than an equally large sample under that normal distribution, and the reason is that the outliers are either too common and/or too far away from the mean. – user2566092 Jun 03 '15 at 21:29
So this is not a consequence of a theorem under reasonable hypothesizes. The shape of the distribution could be asymetrical for exemple, but the results of the tests have been compiled and give an aproximate normal distribution? But then perhaps smart people avoid IQ tests, does that change something? (again: nooby) – LLuu Jun 03 '15 at 21:56
5

What you are missing is that if you don't have a definition for the x axis, you can always parametrize the x axis so whatever distribution is normal. I said it already 4 times here, this answer is incorrect. The IQ test is defined so that the population results are normal with mean 100 and s.d. 15, this replaced the earlier definition of 100 times the ratio "mental age"/"chronological age" which gave age-dependent results, but at least had an objectively defined x axis, and identified prodigies. The modern IQ "g" is just defined to be normally distributed, it's true by definition. – Ron Maimon Jun 09 '15 at 02:36
@RonMaimon Ok, thanks for your answer. What should I do if the answer is a comment? – LLuu Jun 10 '15 at 17:28
@LLuuUsErI132UOutOfMemory: Just modify your answer slightly, and it will be accurate, this site is very annoying, and I don't want to help it along, although it is somewhat less annoying than other sites on the network.. – Ron Maimon Jun 11 '15 at 17:32
1

Height and IQ are fundamentally different. Height is measured using ratio scales, which have a true zero. Two 50-cm tall objects stacked will have the same height as one 100-cm tall object, and each incremental unit is the same magnitude. IQ is totally different. It's an ordinal scale that implies an order but no other mathematical meaning. We could label IQ scores using alphabetically-ordered letters just the same. "100" does not mean twice as smart as "50." There is no true zero, and a negative IQ is possible for a huge population that permits a low enough percentile rank (z-score < -6.67). – zunojeef Dec 10 '21 at 06:35

score 0 · Answer 3 · answered Jun 04 '18 at 17:10

I suggest the distribution of IQ's may be log-linear (not log-normal). Such distributions often called a Gibbs distribution (who first applied it to the distribution of energy and built a strong foundation for thermodynamics (1878), eg Boltzmann) can be applied to like positive-definite variables that have an 'energy' connotation.

t works for me with natural remotely sensed imagery (hurricanes, sea ice, rough ocean surface, cold front occurrences), even heart beat variation BUT only above some threshold. On occasion the down (below average) side can be also log-normal (not necessarily of the same slope).

I'm looking for some data with a sample size large enough to resolve the large deviations from most probable. If anyone has a good suggestion in that regard, please pass it along.

If it turned out to be the case, and IQ's had a log-linear distribution, I would suggest that the IQ variable is acting as an 'energy'. If so I would ascribe the 'energy' to a person's ability to concentrate/focus (as in the colloquial phrase 'brain energy' which would involve the reduction in confusion/entropy associated with tasks.

Why are IQ test results normally distributed?

3 Answers3

Start with any continuous distribution for the test scores.

In this case, it's a multimodal distribution.

We restrict the score to be between zero and ten.

Convert from percentile to IQ

The scores are not even used yet, only their ordering.

Calculating the percentile for a finite sample set depends on

how it is defined. This function returns the smallest and the

highest values.

We use only the definition of percentile closest to the average.

(This is arbitrary and irrelevant for large sample sets)

A new subject performs the test.

Linked