How multi-scale CNN selects final output map

Question

I read a few days ago about multi-scale CNN (OverFeat method) which you can access to presentation via this link. They performed CNN on different scales of an image and then combine all output maps. They said inside of that presentation:

Classification performed at 6 scales at test time, but only 1 scale at run time .

So my question is: If we use 6 different scales of CNN architecture, then we have different convolution layers in every scale (I guess so). So how in OverFeat, they use 1 scale in run time? If we use a specific scale, then how can we access other feature extractors of different scales? And I see in the article, they combine feature maps of different scales but I can't figure out how this process performed.

score 1 · Answer 1 · answered Feb 03 '18 at 16:13

Think of this as varied filter size and varied filter values. It will extract different representation (or say capture different part of the image), and then stack them to get a bigger feature vector. Then, you do the featurisation .Also, check for dilated CNNs used for NLP. They are based on somewhat similar concepts.

How multi-scale CNN selects final output map

1 Answers1