Loss & accuracy – Are these reasonable learning curves?

A little understanding of the actual meanings (and mechanics) of both loss and accuracy will be of much help here (refer also to this answer of mine, although I will reuse some parts)…

For the sake of simplicity, I will limit the discussion to the case of binary classification, but the idea is generally applicable; here is the equation of the (logistic) loss:

enter image description here

  • y[i] are the true labels (0 or 1)
  • p[i] are the predictions (real numbers in [0,1]), usually interpreted as probabilities
  • output[i] (not shown in the equation) is the rounding of p[i], in order to convert them also to 0 or 1; it is this quantity that enters the calculation of accuracy, implicitly involving a threshold (normally at 0.5 for binary classification), so that if p[i] > 0.5, then output[i] = 1, otherwise if p[i] <= 0.5, output[i] = 0.

Now, let’s suppose that we have a true label y[k] = 1, for which, at an early point during training, we make a rather poor prediction of p[k] = 0.1; then, plugging the numbers to the loss equation above:

  • the contribution of this sample to the loss, is loss[k] = -log(0.1) = 2.3
  • since p[k] < 0.5, we’ll have output[k] = 0, hence its contribution to the accuracy will be 0 (wrong classification)

Suppose now that, an the next training step, we are getting better indeed, and we get p[k] = 0.22; now we have:

  • loss[k] = -log(0.22) = 1.51
  • since it still is p[k] < 0.5, we have again a wrong classification (output[k] = 0) with zero contribution to the accuracy

Hopefully you start getting the idea, but let’s see one more later snapshot, where we get, say, p[k] = 0.49; then:

  • loss[k] = -log(0.49) = 0.71
  • still output[k] = 0, i.e. wrong classification with zero contribution to the accuracy

As you can see, our classifier indeed got better in this particular sample, i.e. it went from a loss of 2.3 to 1.5 to 0.71, but this improvement has still not shown up in the accuracy, which cares only for correct classifications: from an accuracy viewpoint, it doesn’t matter that we get better estimates for our p[k], as long as these estimates remain below the threshold of 0.5.

The moment our p[k] exceeds the threshold of 0.5, the loss continues to decrease smoothly as it has been so far, but now we have a jump in the accuracy contribution of this sample from 0 to 1/n, where n is the total number of samples.

Similarly, you can confirm by yourself that, once our p[k] has exceeded 0.5, hence giving a correct classification (and now contributing positively to the accuracy), further improvements of it (i.e getting closer to 1.0) still continue to decrease the loss, but have no further impact to the accuracy.

Similar arguments hold for cases where the true label y[m] = 0 and the corresponding estimates for p[m] start somewhere above the 0.5 threshold; and even if p[m] initial estimates are below 0.5 (hence providing correct classifications and already contributing positively to the accuracy), their convergence towards 0.0 will decrease the loss without improving the accuracy any further.

Putting the pieces together, hopefully you can now convince yourself that a smoothly decreasing loss and a more “stepwise” increasing accuracy not only are not incompatible, but they make perfect sense indeed.


On a more general level: from the strict perspective of mathematical optimization, there is no such thing called “accuracy” – there is only the loss; accuracy gets into the discussion only from a business perspective (and a different business logic might even call for a threshold different than the default 0.5). Quoting from my own linked answer:

Loss and accuracy are different things; roughly speaking, the accuracy is what we are actually interested in from a business perspective, while the loss is the objective function that the learning algorithms (optimizers) are trying to minimize from a mathematical perspective. Even more roughly speaking, you can think of the loss as the “translation” of the business objective (accuracy) to the mathematical domain, a translation which is necessary in classification problems (in regression ones, usually the loss and the business objective are the same, or at least can be the same in principle, e.g. the RMSE)…

Leave a Comment