Neural network always predicts the same class

My network does always predict the same class. What is the problem?

I had this a couple of times. Although I’m currently too lazy to go through your code, I think I can give some general hints which might also help others who have the same symptom but probably different underlying problems.

Debugging Neural Networks

Fitting one item datasets

For every class i the network should be able to predict, try the following:

  1. Create a dataset of only one data point of class i.
  2. Fit the network to this dataset.
  3. Does the network learn to predict “class i”?

If this doesn’t work, there are four possible error sources:

  1. Buggy training algorithm: Try a smaller model, print a lot of values which are calculated in between and see if those match your expectation.
    1. Dividing by 0: Add a small number to the denominator
    2. Logarithm of 0 / negativ number: Like dividing by 0
  2. Data: It is possible that your data has the wrong type. For example, it might be necessary that your data is of type float32 but actually is an integer.
  3. Model: It is also possible that you just created a model which cannot possibly predict what you want. This should be revealed when you try simpler models.
  4. Initialization / Optimization: Depending on the model, your initialization and your optimization algorithm might play a crucial role. For beginners who use standard stochastic gradient descent, I would say it is mainly important to initialize the weights randomly (each weight a different value). – see also: this question / answer

Learning Curve

See sklearn for details.

Learning Curve showing the training error / test error curves to approach each other

The idea is to start with a tiny training dataset (probably only one item). Then the model should be able to fit the data perfectly. If this works, you make a slightly larger dataset. Your training error should slightly go up at some point. This reveals your models capacity to model the data.

Data analysis

Check how often the other class(es) appear. If one class dominates the others (e.g. one class is 99.9% of the data), this is a problem. Look for “outlier detection” techniques.

More

  • Learning rate: If your network doesn’t improve and get only slightly better than random chance, try reducing the learning rate. For computer vision, a learning rate of 0.001 is often used / working. This is also relevant if you use Adam as an optimizer.
  • Preprocessing: Make sure you use the same preprocessing for training and testing. You might see differences in the confusion matrix (see this question)

Common Mistakes

This is inspired by reddit:

  • You forgot to apply preprocessing
  • Dying ReLU
  • Too small / too big learning rate
  • Wrong activation function in final layer:
    • Your targets are not in sum one? -> Don’t use softmax
    • Single elements of your targets are negative -> Don’t use Softmax, ReLU, Sigmoid. tanh might be an option
  • Too deep network: You fail to train. Try a simpler neural network first.
  • Vastly unbalanced data: You might want to look into imbalanced-learn

Leave a Comment