2

I'm confused by the concept of equating a 1x1 convolution with a fully connected layer. Take the following simple example of a 1x1 convolution of 2 input channels each of size 2x2, and a single output channel.

enter image description here

The only way I can relate this to fully connected layers is to say that there are 4 fully connected layers, one for each location in the input feature map (inputs and outputs colour coded).

From what I can understand my interpretation is consistent with the Network in Network paper[Lin et al. 2013] which describe the 1x1 as being equivalent as cross channel parametric pooling

The cross channel parametric pooling layer is also equivalent to a convolution layer with 1x1 con- volution kernel.

I have seen this one from Yann LeCunn equating 1x1 convolutions to a fully connected layer. And I have read this answer and I'm just not seeing the equivalence between a 1x1 convolution over an input volume and a single fully connected layer...

Any insight would be appreciated, if you can please relate back to the example above. Thanks!

Ethan
  • 1,633
  • 9
  • 24
  • 39
nixon
  • 123
  • 4

1 Answers1

2

The interpretation that the 1d convolution given in the OP can be duplicated with four separate fully-connected layers is correct (see diagram). Also, in at least some implementations, kernel weights used during a 1x1 convolution can be made trainable the same way weights in a fully-connected layer can be made trainable. These points made, every fully-connected layer can not be mathematically duplicated by an equivalent 1x1 convolution. This is based on the definition that 1x1 convolution performs a "column-wise dot product" such that every pixel column in a multi-layer feature map is reduced to a single number (pixel). A fully-connected layer intermixes weights differently from the way weights are intermixed when performing a 1x1 convolution. In summary, fully connected layers and 1x1 convolutions each have their own use cases -- some overlap among these use cases exists; however, the two are not intended to be mathematically equivalent in a general sense.

Four separate "dense layers" equivalent to the 1x1 convolution in OP

Aether
  • 36
  • 3