The Attention is all you need paper mentioned positional encoding without lacking some details. I am going to write my understanding at those details
The formula is the following:
PE(pos,2i) =sin(pos/100002i/dmodel)
PE(pos,2i+1) =cos(pos/100002i/dmodel)
Once having the PE (Positional Encoding) value for a position, by the diagram in page 3, it is added to the embedding of the input.
new_embedding[pos, 2i] = embedding[pos, 2i] + PE(pos, 2i) new_embedding[pos, 2i+1] = embedding[pos, 2i+1] + PE(pos, 2i+1)
The embedding variable here is the embedding for each word in a sentence, and pos is the position of the sentence. (It is a sentence - not the whole dictionary.)
This part of the StatQuest video clearly explained how embedding is calculated.
No comments:
Post a Comment