I'm working on a system that reads 3 images per second and stores them in a collection.
For each image I have all its keypoints and descriptor vectors, using ORB detector. On average there are 250 features in an image
To add a new image to the collection, I need to check if a similar image already exists or not.
What is the most efficient way to compare the images using the descriptors/keypoints?
what I tried?
Brute force
compare each descriptor vector in the new image with each one of all the images in the collection. This takes a lot of time that increases gradually with the collection size.
Reference vector comparison
Decide on a reference vector, for example (1,1,1,....,1)
, and calculate the distance between each descriptor vector and the ref vector, then use the distance as a metric to determine descriptor vectors similarity: if 2 descriptors have the same distance from the ref vector, then they're similar.
While this is very efficient computationally, it doesn't yield the correct result in most cases. especially in ORB where the distance is hamming distance.
Reducing to smaller set of vectors
This is a half cooked idea: From each group of vectors with a certain distance to the reference vector, take the K nearest vectors to the ref vector. The idea is to reduce the number of descriptor vectors to compare with.