Popular models such as the transformer model use positional encoding on existing feature dimensions. Why is this preferred over adding more features to the feature dimension of the tensor which can hold the positional information?
Asked
Active
Viewed 16 times
0
-
1Does this answer your question? In a Transformer model, why does one sum positional encoding to the embedding rather than concatenate it? – noe Mar 12 '24 at 22:15