Friday, February 23, 2024

Summary on Shusen Wang's video of: Converting attributes to Features

Here are a few ways to deal with attribute to make them into features.

  • Discrete values: apply one-hot encoding. Then make embeddings from one-hot encoding input.
  • Continuous values: divide value ranges into ranges, and then apply one-hot encoding for each range.
  • Large continuous values: apply log(1+x), and then treat it as a regular Continuous value.
    • An alternative is to convert large values into ratios. For example, number of clicks into click rate.
Thumbnail image of a record can also be converted into vector to assist classification / prediction tasks.

User attributes and Item attributes are relatively static, has little changes. They are perfect caching candidates But user stats and items status may change on every click / user interaction. It is better to separate stats and the basic attributes.

Feature crosses are non-linear features derived from features with non-linear operations. A weighted sum of a vector (x1, x2, ... xd) would be a linear operation, while multiplication of features creates "crosses".  For an example of second-order feature crossing, a weighted sum of xi * xj could introduce a non-linear term. To visualize how this works better - consider a vector with quality and price per unite as features. A multiplication of the two predicts the price effectively, while linear combination of the two features can not accurately describe the price.

n features creates n ** 2 of second-order feature crossing. This could become a lot of new dimensions. To reduce the dimensions, we can consider it as an n * n matrix, and then convert it into multiplication of a low-rank matrix and its transpose.

Call the low rank matrix V. Thus at the (i,j) position, 

weight(i,j) * xi * xj = (V.transpose)i * Vj

Using matrix V makes the model become a Factorized Machine (FM), which is simply a linear model plus second-order feature crossing.

Note: FM has become less popular nowadays in recommendation system.


Collaborative Filtering features that are based on userIds and itemIds, can be more accurate than tag matching. However, for a new item that hasn't had any user interaction, it is impossible to estimate Collaborative Filtering features accurately. Thus during cold start of a new item, tag matching is used to take top K similar items to get their Collaborative Filtering features - use their average as the initial guessed value for this feature. Keep using the guessed value, until enough user has visited the new item.

Use K-mean cluster to divide the items into clusters - there could be thousands of clusters. Use the centroid of the cluster as the index for searching. When a new item is created, it will be matched to clusters. For example to match top 1000 clusters. Similarly when a user comes up, take her last N recent items, and match clusters. Once found clusters, pick items from them.

References:


No comments: