Random mask: randomly task keywords at a certain attributes.
Dropout are techniques in collecting features during training from the same item. Dropout is to randomly take away a percentage of keywords (assuming there are many keywords). In this way, the feature on an item is more generalized.
Complementary features: split keywords on 1 item into two sets, and each set map into a feature vector (Thus the two are complementary features) by only seeing 50% of the keywords at a time. Since they are on the same item, the two vectors should large cosine similarity.
Mask related features: calculate the mutual information between two keywords on the items. (Calculated as the sum of all keywords (u,v) MI(U,V) = sum( p(u,v) * log (p(u,v) / (p(u) * p(v) ), where p(u,v) are the probability that the two keywords on the same item. Then, for each keyword k, find half of other keywords that are related to this keyword k. Mask the half of the keywords that are closely related to this keyword k, and use the 50% of less related keywords.
(In practice, the method of masking related features is hard to calculate and hard to maintain.)
How to train: convert all records into vectors: by applying multiple mask techniques and let it multiply matrix to convert it into a vector. Pair vectors to calculate similarity. Take the similarity into a softmax layer. Let expected the cosine similarity of the vectors from the same object be 1 and let the vectors from different object be 0. Take the crossEntropyLoss between the calculated value and expected value.
It is also possible to combine the loss function of self-supervised learning with the loss function of the Contrast Learning, by taking a simple summation.
Reference: https://www.youtube.com/watch?v=Ra3MVhneR9E (voiced in Chinese)
No comments:
Post a Comment