Wednesday, February 21, 2024

Summary on Shusen Wang's video of: Reduce negative samples and Adjust prediction Rate

Often classifiers are trained on unequal amount of positive data and negative data. It is a common practice to reduce the negative data (by a rate a to the positive samples). The reducing the negative samples causes the predicted value larger than the actual occurrence. However, the value can be adjusted by:

Let

  P_true = n_positive / (n_positive + n_negative)

  P_pred = n_positive / (n_positive + n_negative * a)

And the adjusted expectation can be derived:

   P_true = a * P_pred / ((1-P_pred)  + a * P_pred )


So the predicted expectation can be adjusted to scale as if trained based on equal amount of samples from both sides.


Reference: https://youtu.be/kY4W46MQqsg?si=U33AhcwpvCvQ3G_2&t=526

No comments: