does it make sense to consider the base/backbone network one of the multiscale feature map blocks in SSD?

Question

I'm trying to understand Single Shot Multibox Detection following a book adopted at 500 universities from 70 countries

The complete single shot multibox detection model consists of five blocks. The feature maps produced by each block are used for both (i) generating anchor boxes and (ii) predicting classes and offsets of these anchor boxes. Among these five blocks, the first one is the base network block, the second to the fourth are downsampling blocks, and the last block uses global max-pooling to reduce both the height and width to 1. Technically, the second to the fifth blocks are all those multiscale feature map blocks in Fig. 14.7.1.

the following is Fig. 14.7.1. mentioned in the above excerpt

I'm aware that the second to the fifth blocks are all those multiscale feature map blocks. My only concern is whether the base network is also a feature map block. I guess it is though the book doesn't convey the idea explicitly. So I guess I need a double confirm.

Hi @singularli, and welcome to AI Stack Exchange! If possible, please outline your question a little more explicitly. That might help in getting more answers. — DeepQZero, Dec 19 '23 at 20:20

score 2 · Accepted Answer · answered Dec 19 '23 at 05:34

2

Yes, you are correct The base network block is also a feature map block.

The base network block extracts features from the input image. It is using CNN modules with filter num of [3, 16, 32, 64]. It Outputs the feature maps that are used for generation of anchor boxes and predicting clases as well as the offset of those anchor boxes.

def base_net():
    blk = []
    num_filters = [3, 16, 32, 64]
    for i in range(len(num_filters) - 1):
        blk.append(down_sample_blk(num_filters[i], num_filters[i+1]))
    return nn.Sequential(*blk)
forward(torch.zeros((2, 3, 256, 256)), base_net()).shape

reference

answered Dec 19 '23 at 05:34

Hiren Namera

741
6
19

thank you. so, the original phrasing seems to be somewhat misleading. A clearer way might be 'All 5 blocks are multiscale feature map blocks.', right? – singularli Dec 20 '23 at 15:04
Yes exactly all blocks are feature maps... – Hiren Namera Dec 20 '23 at 15:16

does it make sense to consider the base/backbone network one of the multiscale feature map blocks in SSD?

1 Answers1