1

Are there some effective and robust solutions for scaling and rotation for image recognition with the neural networks (NN)?

I see tons of sources on the Web with explanation how neural network is used for image recognition, but all of them avoiding the topic of scaled or rotated images. The network trained for patterns won’t recognize it if the pattern scaled or rotated.

Of course, there are some intuitive/naive workarounds/approaches:

  1. Brute force – you can rotate and scale image until NN recognizes it. Too much expensive.
  2. You may teach NN for all cases of rotated and scaled image, could be hard and will lead to NN weakening.
  3. You may teach NN that some images (which are rotation and scale of the original image) are clusters and teach to recognize clusters and interpolate/extrapolate them. A little bit tricky in coding and debugging.
  4. For rotation you can move to polar coordinates, this gives a kind of invariant both for recognizing patterns and building histograms for specific portions of the image. But for this you need to recognized the pivot point and again this is quite expensive.

Are there any better solutions, ideas, hints, references?

(I read some answers there to the rotational problem, but what I saw doesn't cover the topic).

  • @norbertk, of course. You even don't need to generate them, it is just the way the algorithm (some layer of it) converts the information before processing. Anyway, this is quite costly, you will need lots of cycles with different rotations and what is worse, you won't have an idea what number of scales to use, you can only set some reasonable thresholds. Applying both rotations and scales multiplies the number of processing cycles, so the amount of work will be tremendous. – Damir Tenishev Sep 01 '21 at 18:40
  • @norbertk, I understood this. I just say this this can easily be optimized. Some intermediate layer will give the NN rotated and scaled results just based on the angle and scale and original image. No need to create images and waste memory and throughput. This is done in the intermediate layer, NN won't see it. For NN it is the same either to read from memory (file) or from some stream (function). Anyway, this doesn't matter. Even with your case, just imagine how many files you will have to create and process, there will be tons of them. This is a strike on performance. – Damir Tenishev Sep 01 '21 at 21:23
  • @norbertk, well, so let's consider your option. We generate images on fly. Let's say we recognize real-life not self-transforming objects like cars, airplanes, etc. How many images will we need? What is the appropriate step for rotation? One degree, ten degrees? How many scales will we need? We will have a crazy amount of their combinations. All will take time, scaling, rotation, NN learning, etc. For each image we will have to pass through all these operations. This works, but really slow. We should have a way to do this faster. – Damir Tenishev Sep 02 '21 at 17:16

0 Answers0