8

I am learning PyTorch and CNNs but am confused how the number of inputs to the first FC layer after a Conv2D layer is calculated.

My network architecture is shown below, here is my reasoning using the calculation as explained here.

The input images will have shape (1 x 28 x 28).

The first Conv layer has stride 1, padding 0, depth 6 and we use a (4 x 4) kernel. The output will thus be (6 x 24 x 24), because the new volume is (28 - 4 + 2*0)/1.

Then we pool this with a (2 x 2) kernel and stride 2 so we get an output of (6 x 11 x 11), because the new volume is (24 - 2)/2.

Same thing for the second Conv and pool layers, but this time with a (3 x 3) kernel in the Conv layer, resulting in (16 x 3 x 3) feature maps in the end.

My assumption would then be that the first linear layer should have 144 inputs (16 * 3 * 3), but when I calculate the inputs programatically, I get 400. What did I miss?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(400, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))
def forward(self, x):
    x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
    x = F.max_pool2d(F.relu(self.conv2(x)), 2)
    x = x.view(-1, self.num_flat_features(x))
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(x))
    x = self.fc3(x)
    return x

def num_flat_features(self, x):
    size = x.size()[1:]
    num_features = 1
    for s in size:
        num_features *= s
    return num_features # 400, not 144

Related but less so: is there a reasoning used by people to get a good kernel size, number of layers and number of pool layers or does everyone just look at what the SOTA papers do?

bmurauer
  • 117
  • 7
Into Jo
  • 83
  • 1
  • 1
  • 5

4 Answers4

12

Hello and welcome to Stack Exchange!

The answer to your question is quite simple: you did not use the correct formula.

The formula you used is (assuming we are working with square inputs)

$$ W'=\frac{W-F+2P}{S} $$

but the correct formula is

$$ W'=\frac{W-F+2P}{S}+1 $$

Now if we redo your calculations starting with $(1 \times 28 \times 28)$ inputs:

$$ W^{(1)}=28-4+1=25\\ W^{(2)}=\lfloor\frac{25-2}{2}+1\rfloor=12\\ W^{(3)}=12-3+1=10\\ W^{(4)}=\lfloor\frac{10-2}{2}+1\rfloor=5 $$

Considering that the second convolution layer has 16 output channels (or feature maps), you can indeed then calculate the number of inputs as $16\cdot5^2=400$.

RaptorDotCpp
  • 236
  • 2
  • 2
5

If you are willing to give additional input parameters to the CNN, you can calculate it automatically. Input dim for MNIST is input_dim=(1,28,28). So that, I can calculate it like this:

import torch
from torch import nn

import functools import operator

class CNN(nn.Module): """Basic Pytorch CNN implementation"""

def __init__(self, in_channels, out_channels, input_dim):
    nn.Module.__init__(self)
    self.feature_extractor = nn.Sequential(
        nn.Conv2d(in_channels=in_channels, out_channels=20, kernel_size=3, stride=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2),

        nn.Conv2d(in_channels=20, out_channels=50, kernel_size=3, stride=1),
        nn.ReLU(inplace=True),
        nn.MaxPool2d(kernel_size=2),
    )

    num_features_before_fcnn = functools.reduce(operator.mul, list(self.feature_extractor(torch.rand(1, *input_dim)).shape))

    self.classifier = nn.Sequential(
        nn.Linear(in_features=num_features_before_fcnn, out_features=100),
        nn.Linear(in_features=100, out_features=out_channels),
    )

def forward(self, x):
    batch_size = x.size(0)

    out = self.feature_extractor(x)
    out = out.view(batch_size, -1)  # flatten the vector
    out = self.classifier(out)
    return out

bmurauer
  • 117
  • 7
komunistbakkal
  • 101
  • 2
  • 3
2

You can use torch.nn.AdaptiveMaxPool2d to set a specific output.

For example, if I set nn.AdaptiveMaxPool2d((5,7)) I am forcing the image to be a 5X7. Then you can just multiply that by out_channels from your previous Conv2d layer.

https://pytorch.org/docs/stable/nn.html#torch.nn.AdaptiveMaxPool2d

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 4)
        self.conv2 = nn.Conv2d(6, 16, 3)
        self.adapt = nn.AdaptiveMaxPool2d((5,7))
        self.fc1 = nn.Linear(16*5*7, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, len(classes))

    def forward(self, x):
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        x = self.adapt(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*7)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
  • Hi thanks for this method - how did you determine/choose the dimension (5,7)? Will different shape influence the final performance? – Veronica Cheng May 25 '20 at 11:03
1

I added a method to Pytorch model for determining the input linear layer neuron size automatically, hopefully it will be helpful for anyone struggling with calculations.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
                               #color channel, # of conv layers
        self.conv1 = nn.Conv2d(in_channels= 1, out_channels= 32, kernel_size= 3)
        self.maxpool = nn.MaxPool2d(kernel_size= 2, stride= 2)
        self.conv2 = nn.Conv2d(32, 64, 5)
        self.neurons = self.linear_input_neurons()

        self.fc1 = nn.Linear(self.linear_input_neurons(), 1000)
        self.fc2 = nn.Linear(1000, 500)
        self.fc3 = nn.Linear(500, classes)

    def forward(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))
        x = x.view(-1, self.neurons)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)

        return x

    # here we apply convolution operations before linear layer, and it returns the 4-dimensional size tensor. 
    def size_after_relu(self, x):
        x = self.maxpool(F.relu(self.conv1(x.float())))
        x = self.maxpool(F.relu(self.conv2(x.float())))

        return x.size()


    # after obtaining the size in above method, we call it and multiply all elements of the returned size.
    def linear_input_neurons(self):
        size = self.size_after_relu(torch.rand(1, 1, 64, 32)) # image size: 64x32
        m = 1
        for i in size:
            m *= i

        return int(m)
  • Welcome to DS StackExchange. Please add some description to your code, so that other users can understand it more clearly. Thank you – Leevo Mar 20 '20 at 12:56