How are the transformer encoder outputs handled?

Asked May 09 '23 at 07:05

Active May 09 '23 at 07:05

Viewed 100 times

According to the Attention Is All You Need paper, the transformer's encoder portion is described as

The encoder is composed of a stack of N = 6 identical layers. Each layer has two sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-wise fully connected feed-forward network.

How are the outputs of the N_x=6 identical layers put together? Is it via concatenation, summation, element-wise product, or are the 6 blocks placed in sequential order, etc? Is the same also done for the decoder?

asked May 09 '23 at 07:05

rkuang25

1

See this for an extensive explanation on the decoder. The encoder 'stacking' works the same. They are put in sequence. – Robin van Hoorn May 09 '23 at 07:25
1

the output of one feeds into the input of the next – user253751 May 09 '23 at 10:33

How are the transformer encoder outputs handled?

0 Answers0