Often classifiers are trained on unequal amount of positive data and negative data. It is a common practice to reduce the negative data (by a rate a to the positive samples). The reducing the negative samples causes the predicted value larger than the actual occurrence. However, the value can be adjusted by:
Let
P_true = n_positive / (n_positive + n_negative)
P_pred = n_positive / (n_positive + n_negative * a)
And the adjusted expectation can be derived:
P_true = a * P_pred / ((1-P_pred) + a * P_pred )
So the predicted expectation can be adjusted to scale as if trained based on equal amount of samples from both sides.
Reference: https://youtu.be/kY4W46MQqsg?si=U33AhcwpvCvQ3G_2&t=526
No comments:
Post a Comment