10

I've read in many papers on higher-dimensional nearest neighbor search that KD-Trees are exponential in K, but I can't seem to determine why.

What I'm looking for is a solid runtime-complexity analysis which explains this aspect of the problem.

Raphael
  • 72,336
  • 29
  • 179
  • 389
akdom
  • 200
  • 5
  • Quick thought is that k is effectively the dimension of the problem and so it suffers from the "curse of dimensionality." – Michael Klein Feb 21 '16 at 06:36

1 Answers1

2

kNN tends to be exponential because the search space increases with $2^k$. Imagine you partition the space around your search point into quadrants. For k=1 you just have to search two 'quadrants' (higher and lower values), for k=2 it's 4 quadrants, for k=3 it's 8 quadrants, i.e. exponential growth of search space. That is what the kD-tree suffers from, because it has to search $2^k$ sub-branches.

Other trees perform much better, for example the CoverTree. I also found that the PH-Tree works quite well, it seems to consistently take twice as long as the CoverTree for datasets between k=8 and k=27 (I didn't have datasets with higher k).

TilmannZ
  • 764
  • 4
  • 6