Tackling Class Imbalance: scaling contribution to loss and sgd

Why don’t you use the InfogainLoss layer to compensate for the imbalance in your training set?

The Infogain loss is defined using a weight matrix H (in your case 2-by-2) The meaning of its entries are

[cost of predicting 1 when gt is 0,    cost of predicting 0 when gt is 0
 cost of predicting 1 when gt is 1,    cost of predicting 0 when gt is 1]

So, you can set the entries of H to reflect the difference between errors in predicting 0 or 1.

You can find how to define matrix H for caffe in this thread.

Regarding sample weights, you may find this post interesting: it shows how to modify the SoftmaxWithLoss layer to take into account sample weights.


Recently, a modification to cross-entropy loss was proposed by Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár Focal Loss for Dense Object Detection, (ICCV 2017).
The idea behind focal-loss is to assign different weight for each example based on the relative difficulty of predicting this example (rather based on class size etc.). From the brief time I got to experiment with this loss, it feels superior to "InfogainLoss" with class-size weights.

Leave a Comment