Was von Neumann right that 'with four parameters you can fit an elephant'?

Question

I was looking into free parameters and found this previous nice answer to

What is a “free parameter” in a computational model?

Is it possible to actually demonstrate that with x free parameters, any function can be approximated?

For example, in $A=aB^2+bB+c$, could a,b,c (free parameters) allow me to turn this into whatever shape I want? Or are there limits to this?

Bananach · Answer 1 · 2018-10-25T15:23:19.700

(1) Any function can be modelled by itself, using no free parameter: $f(x)=f_1(x)$, where $f_1(x):=f(x)$.

(2) For any model, with an arbitrary but finite number of degrees of freedom, $a_1 f_1(x)+a_2 f_2(x)+\dots$, there always exists a function that is very far away from being able to be modelled with that model. (This can be given a precise mathematical formulation. For your quadratic model, for example, you'll not be able to model a function with both a maximum and a minimum using that model)

What von Neumann meant was along the lines of statement (1): Since there are infinitely many ways to choose a model (the model itself, i.e. the functions $f_i$, not the parameters of a fixed model) it's not a great feat to find good fits to data even with small models. Even though standard models might not contain the function $f$ that you are trying to model, if you look through enough models with 4 degrees of freedom, you'll find an exotic one that models your data very closely just by coincidence.

Miguel · Answer 2 · 2018-10-25T10:13:16.323

No, a fixed number of parameters for a fixed parametrisation is not enough to approximate any general function $f$.

If you are interested only in a particular class of functions, and the model is true, then yes, you can approximate all of them with a fixed number of arguments. For instance, if the functions $f$ that you want to approximate happen to be all real valued with real argument and linear, then you only need 2 parameters to approximate any such function, since $f(x) = a x + b$ by assumption. But as soon as the assumption is false, 2 parameters won't be enough.

More generally, a model is a simplifying assumption about the form of $f$. For instance, one can choose the model to be a finite subspace of a function space which is dense in a space where you assume $f$ to live in

E.g. if you assume $f$ to be continuous, by Stone-Weierstrass you know that for every $\epsilon > 0$ there is some polynomial $p$ of unknown order $n=n(f, \epsilon)$ uniformly approximating it within distance $\epsilon$. So in principle you need only $n+1$ parameters to get $\epsilon$ close to a fixed $f$, but you don't know $n$ since it will change for each $f$. So you can try with a polynomial space of fixed high order, knowing that your approximation will almost certainly be off, unless the $f$ generating the data happens to be a polynomial of degree less or equal to the degree you fixed.

Another example of a model is a truncated Fourier series. Under mild assumptions on $f$, Fourier series are good approximators. The more parameters (coefficients of the series) that you add, the closer your approximation will be to the true function, but this proximity depends on the regularity of $f$, about which you don't know anything a priori. So again, you fix a space with say 50 coefficients and there will always be an integrable function which cannot be well approximated with that number. For instance there will be issues like the Gibb's phenomenon if $f$ has any jumps.

Neural networks are another family of good approximators ("good" meaning with few parameters) for certain very general classes, but it can be proven that they can also be very bad, requiring an exponentially large number of parameters to approximate some functions.

If you consider fixing the number of parameters and swapping the model, then you can improve the fit, i.e. reduce the distance between the best approximation in your space and $f$. And this leads to the assertion: there will always be some space where just a few parameters are enough to fit your data (we are assuming no noise).

Misread the question! A fixed number of parameters cannot approximate any function, of course — Miguel, Oct 25 '18 at 07:35

Was von Neumann right that 'with four parameters you can fit an elephant'?

2 Answers2