1

I'm working on a system that reads 3 images per second and stores them in a collection.

For each image I have all its keypoints and descriptor vectors, using ORB detector. On average there are 250 features in an image

To add a new image to the collection, I need to check if a similar image already exists or not.

What is the most efficient way to compare the images using the descriptors/keypoints?

what I tried?

Brute force

compare each descriptor vector in the new image with each one of all the images in the collection. This takes a lot of time that increases gradually with the collection size.

Reference vector comparison

Decide on a reference vector, for example (1,1,1,....,1), and calculate the distance between each descriptor vector and the ref vector, then use the distance as a metric to determine descriptor vectors similarity: if 2 descriptors have the same distance from the ref vector, then they're similar. While this is very efficient computationally, it doesn't yield the correct result in most cases. especially in ORB where the distance is hamming distance.

Reducing to smaller set of vectors

This is a half cooked idea: From each group of vectors with a certain distance to the reference vector, take the K nearest vectors to the ref vector. The idea is to reduce the number of descriptor vectors to compare with.

1 Answers1

1

I would like to share some ideas on how to make the "brute force" approach (more) feasible. The theme throughout all of these ideas is to have the vector you are using to represent an image to be as small as possible.

Assuming you want to use descriptors/keypoints as described in your question:

In this case, you want to transform your vector with high dimensionality into one with low dimensionality. Essentially, you want to throw away all unnecessary information. If this transformation is a linear map, you would be looking at PCA (Principal Component Analysis). For an example, see Principal Component Analysis for Dimensionality Reduction in Python.

Alternatively, you could use an autoencoder (neural network), which would be capable of expressing non-linear transforms as well (assuming you use non-linear activation functions).
An auto encoder usually has a part that encodes the input into a smaller vector, and another part that decodes that smaller vector back to the input. After training both, you'd take the output of the encoder, which receives an input (e.g. an image) and outputs a vector. Note that training the auto encoder is done once on a powerful machine.
For an example, see Intro to Autoencoders.

If you don't actually care too much about the descriptors/keypoints:

You could consider training a neural network to find a vector representation suitable for image similarity for you. For an example, see Image similarity estimation using a Siamese Network with a triplet loss. Since you mentioned in a comment that you are building a mobile app, note that there are neural nets for mobile vision applications, like MobileNet.

Regarding the time complexity

Keep in mind that what we are doing is applying a map from an input vector (e.g. image, keypoints, descriptors, etc.) and thus reducing it to a smaller vector. This transformation only needs to be done once; afterwards, you can cache the result. The time complexity of this transformation depends on the method and model you are using; an encoder with more parameters is more "expensive". You can find more information about PCA and its time complexity in the scikit-learn user guide.

ndaniel
  • 41
  • 3
  • 1
    Thank you Daniel! The auto encoder will receive and image and outputs one vector? i'm concerned though about the overhead and time complexity of the model, if it's less efficient that ORB detector. same concern about time complexity for PCA. My use case here is that im building a mobile app, so this calculation is being run on less powerful hardware, also adding more external libraries will also impact the performance of the app, hence im considering mainly "old school" methods – ThunderWiring Jun 02 '22 at 05:56
  • I edited the answer to hopefully answer your questions. Keep in mind that these ideas aren't that new; for example, take a look at this paper from 1991: Eigenfaces for Recognition (in particular step 5 under "Summary of Eigenface Recognition Procedure" on page 6 of the PDF). – ndaniel Jun 02 '22 at 08:28