Can a simple distance to a few nearest data points be used a measure of the uncertainty of a prediction?

Question

One of the 'selling points' of the Gaussian process regression is that it provides not only the model but also the uncertainty estimate of a prediction. Then usually a picture is shown with a curve fitted to the data and a shaded area around it showing uncertainties (e.g. link). That uncertainty band gets thicker further from the data points and shrinks to zero at the points.

My question is:

Is a Gaussian process necessary for that? Can a simple distance to a few nearest data points be used as a measure of the uncertainty of a prediction? And thus, can any machine learning method be considered a probabilistic method?

I mean, the closer we are to the training data, the less uncertainty we have about our prediction. On the contrary, if we are far from the training data, we can't be sure that our prediction is accurate.

score 0 · Answer 1 · answered Jul 04 '22 at 16:11

Is a Gaussian process necessary for that?

No - There are many ways to handle uncertainty in machine learning. For example, a Gaussian process is parametric. You can also use non-parametric methods like kernel density estimates (KDE.

Can a simple distance to a few nearest data points be used as a measure of the uncertainty of a prediction?

Yes - For models that use distance metrics. Not all machine learning models use distance metrics.

And thus, can any machine learning method be considered a probabilistic method?

No - There are many machine learning methods that are not probabilistic. One example is a decision tree.

Decision trees as probabilistic classifiers – Vladislav Gladkikh Jul 05 '22 at 13:07 — Vladislav Gladkikh, Jul 05 '22 at 13:07

Can a simple distance to a few nearest data points be used a measure of the uncertainty of a prediction?

1 Answers1