I read a few days ago about multi-scale CNN (OverFeat
method) which you can access to presentation via this link. They performed CNN on different scales of an image and then combine all output maps. They said inside of that presentation:
Classification performed at 6 scales at test time, but only 1 scale at run time .
So my question is: If we use 6 different scales of CNN architecture, then we have different convolution layers in every scale (I guess so). So how in OverFeat
, they use 1 scale in run time? If we use a specific scale, then how can we access other feature extractors of different scales? And I see in the article, they combine feature maps of different scales but I can't figure out how this process performed.