I'm attempting to write a forward pass of a CNN but I'm stuck on the second convolutional layer.
From what I understand, given an image of size 28x28, a first filter of size 10x3x3, a second filter 20x3x3, and max pooling after each filter, the input shape should go from 28x28 > 10x28x28 > 10x14x14 > 20x14x14 > 20x7x7 (assuming a stride of 1).
My initial intuition says the shape after the second filter should be 20x10x14x14. However after doing some digging, from my understanding, you treat the sliding window as a sliding 'block'.
Currently my program spits out 20x10x14x14 using scipy.signal.correlate function.
input_ = x_train[0]
filters = np.random.randn(20, 3, 3)
temp = []
for i in range(20):
temp.append(scipy.signal.correlate(input_, filters[i], 'same'))
input_ = np.array(temp)
print(input_.shape)
filters = np.random.randn(10, 20, 3, 3)
temp = []
for i in range(10):
temp.append(scipy.signal.correlate(input_, filters[i], 'same'))
input_ = np.array(temp)
print(input_.shape)
I would be grateful for any help.