λ
home posts study uses projects

Positional Encoding

Jul 18, 2024

machine-learning

When performing a Vector Embedding of a sentence, you would like retain the positional information of each word when passing your input to your model. This way the model can learn to recognize different uses of the words in the sentence.

707

The authors of the Attention Is All You Need paper decided to use trigonometric functions to encode the positional information. The coscos and sinsin functions have the nice property that the values are constrained between [1,1][-1, 1]. They also follow a pattern which means that the model can learn accordingly. This is the reason why we don’t use the index of the token itself – it has a value in the range [0,d][0, d] which means that words that appear further in the sentence will have a bigger value. They decided to encode odd vector indexes with coscos and even vector indexes with sinsin:

PE(pos,2i)=sin(pos100002idmodel)PE(pos,2i+1)=cos(pos100002idmodel)\begin{align*} \text{{PE}}(pos, 2i) &= \sin\left(\frac{{pos}}{{10000^{\frac{{2i}}{{d_{\text{{model}}}}}}}}\right) \\ \text{{PE}}(pos, 2i+1) &= \cos\left(\frac{{pos}}{{10000^{\frac{{2i}}{{d_{\text{{model}}}}}}}}\right) \end{align*}

After calculating the positional embedding for the given position, it’s added up together with the corresponding word vector embedding.

References

LλBS

All systems up and running

Commit 84c1d46, deployed on Jan 22, 2025.