Wednesday, August 02, 2023

Details in Positional Encoding for Transformer

The Attention is all you need paper mentioned positional encoding without lacking some details. I am going to write my understanding at those details

The formula is the following:

PE(pos,2i=sin(pos/100002i/dmodel)

PE(pos,2i+1) =cos(pos/100002i/dmodel

The paper mentioned that the i is the dimension index of and dmodel is dimension of the embedding. If so, given the last i = dmodel -1,  2i will be out of the bound. So, that is not the correct explanation.

2i and 2i+ 1 here suggest even and odd dimension indices. At the even dimension indices, apply sine function; at the odd dimension indices, apply cosine function. So i is ranged from [0, to dmodel/2) and for each i, it generates 2 dimensions.

Once having the PE (Positional Encoding) value for a position, by the diagram in page 3, it is added to the embedding of the input.

new_embedding[pos, 2i] = embedding[pos, 2i] + PE(pos, 2i) new_embedding[pos, 2i+1] = embedding[pos, 2i+1] + PE(pos, 2i+1)

The embedding variable here is the embedding for each word in a sentence, and pos is the position of the  sentence. (It is a sentence - not the whole dictionary.)

This part of the StatQuest video clearly explained how embedding is calculated.


No comments: