Determining size of FC layer after Conv layer in PyTorch

Question

I am learning PyTorch and CNNs but am confused how the number of inputs to the first FC layer after a Conv2D layer is calculated.

My network architecture is shown below, here is my reasoning using the calculation as explained here.

The input images will have shape (1 x 28 x 28).

The first Conv layer has stride 1, padding 0, depth 6 and we use a (4 x 4) kernel. The output will thus be (6 x 24 x 24), because the new volume is (28 - 4 + 2*0)/1.

Then we pool this with a (2 x 2) kernel and stride 2 so we get an output of (6 x 11 x 11), because the new volume is (24 - 2)/2.

Same thing for the second Conv and pool layers, but this time with a (3 x 3) kernel in the Conv layer, resulting in (16 x 3 x 3) feature maps in the end.

My assumption would then be that the first linear layer should have 144 inputs (16 * 3 * 3), but when I calculate the inputs programatically, I get 400. What did I miss?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(400, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))
def forward(self, x):
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), 2)
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

def num_flat_features(self, x):
    size = x.size()[1:]
    num_features = 1
    for s in size:
        num_features *= s
    return num_features # 400, not 144

Related but less so: is there a reasoning used by people to get a good kernel size, number of layers and number of pool layers or does everyone just look at what the SOTA papers do?

I'd suggest simply to do print(x.shape) in forward() once, to know the exact values. — LAKSHYA SINGH, Apr 06 '21 at 12:21

score 12 · Accepted Answer · answered Nov 10 '18 at 09:31

Hello and welcome to Stack Exchange!

The answer to your question is quite simple: you did not use the correct formula.

The formula you used is (assuming we are working with square inputs)

$$ W'=\frac{W-F+2P}{S} $$

but the correct formula is

$$ W'=\frac{W-F+2P}{S}+1 $$

Now if we redo your calculations starting with $(1 \times 28 \times 28)$ inputs:

$$ W^{(1)}=28-4+1=25\\ W^{(2)}=\lfloor\frac{25-2}{2}+1\rfloor=12\\ W^{(3)}=12-3+1=10\\ W^{(4)}=\lfloor\frac{10-2}{2}+1\rfloor=5 $$

Considering that the second convolution layer has 16 output channels (or feature maps), you can indeed then calculate the number of inputs as $16\cdot5^2=400$.

score 5 · Answer 2 · edited May 28 '21 at 14:05

If you are willing to give additional input parameters to the CNN, you can calculate it automatically. Input dim for MNIST is input_dim=(1,28,28). So that, I can calculate it like this:

import torch
from torch import nn
import functools
import operator
class CNN(nn.Module):
    """Basic Pytorch CNN implementation"""
def __init__(self, in_channels, out_channels, input_dim):
    nn.Module.__init__(self)
    self.feature_extractor = nn.Sequential(
        nn.Conv2d(in_channels=in_channels, out_channels=20, kernel_size=3, stride=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2),

        nn.Conv2d(in_channels=20, out_channels=50, kernel_size=3, stride=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2),
    )

    num_features_before_fcnn = functools.reduce(operator.mul, list(self.feature_extractor(torch.rand(1, *input_dim)).shape))

    self.classifier = nn.Sequential(
        nn.Linear(in_features=num_features_before_fcnn, out_features=100),
        nn.Linear(in_features=100, out_features=out_channels),
    )

def forward(self, x):
    batch_size = x.size(0)

    out = self.feature_extractor(x)
    out = out.view(batch_size, -1)  # flatten the vector
    out = self.classifier(out)
    return out

score 2 · Answer 3 · answered Aug 02 '19 at 04:22

You can use torch.nn.AdaptiveMaxPool2d to set a specific output.

For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7. Then you can just multiply that by out_channels from your previous Conv2d layer.

https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.adapt = nn.AdaptiveMaxPool2d((5,7))
        self.fc1 = nn.Linear(16*5*7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = self.adapt(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*7)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Hi thanks for this method - how did you determine/choose the dimension (5,7)? Will different shape influence the final performance? — Veronica Cheng, May 25 '20 at 11:03

Anil Bora Yayak · Answer 4 · 2020-03-23T18:21:59.530

I added a method to Pytorch model for determining the input linear layer neuron size automatically, hopefully it will be helpful for anyone struggling with calculations.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
                               #color channel, # of conv layers
        self.conv1 = nn.Conv2d(in_channels= 1, out_channels= 32, kernel_size= 3)
        self.maxpool = nn.MaxPool2d(kernel_size= 2, stride= 2)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.neurons = self.linear_input_neurons()

        self.fc1 = nn.Linear(self.linear_input_neurons(), 1000)
        self.fc2 = nn.Linear(1000, 500)
        self.fc3 = nn.Linear(500, classes)

    def forward(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))
        x = x.view(-1, self.neurons)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

    # here we apply convolution operations before linear layer, and it returns the 4-dimensional size tensor. 
    def size_after_relu(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))

        return x.size()


    # after obtaining the size in above method, we call it and multiply all elements of the returned size.
    def linear_input_neurons(self):
        size = self.size_after_relu(torch.rand(1, 1, 64, 32)) # image size: 64x32
        m = 1
        for i in size:
            m *= i

        return int(m)

Welcome to DS StackExchange. Please add some description to your code, so that other users can understand it more clearly. Thank you — Leevo, Mar 20 '20 at 12:56

Determining size of FC layer after Conv layer in PyTorch

4 Answers4