memory issues when transforming np.array using to_categorical

You don’t need to use to_categorical since I guess you are doing multi-label classification. To avoid any confusion once and for all(!), let me explain this.

If you are doing binary classification, meaning each sample may belong to only one
of two classes e.g. cat vs dog or happy vs sad or positive review vs negative review, then:

  • The labels should be like [0 1 0 0 1 ... 0] with shape of (n_samples,) i.e. each sample has a one (e.g. cat) or zero (e.g. dog) label.
  • The activation function used for the last layer is usually sigmoid (or any other function that outputs a value in range [0,1]).
  • The loss function usually used is binary_crossentropy.

If you are doing multi-class classification, meaning each sample may belong to only one of many classes e.g. cat vs dog vs lion or happy vs neutral vs sad or positive review vs neutral review vs negative review, then:

  • The labels should be either one-hot encoded, i.e. [1, 0, 0] corresponds to cat, [0, 1, 0] corresponds to dog and [0, 0, 1] corresponds to lion, which in this case the labels have a shape of (n_samples, n_classes); Or they can be integers (i.e. sparse labels), i.e. 1 for cat, 2 for dog and 3 for lion, which in this case the labels have a shape of (n_samples,). The to_categorical function is used to convert sparse labels to one-hot encoded labels, of course if you wish to do so.
  • The activation function used is usually softmax.
  • The loss function used depends on the format of labels: if they are one-hot encoded, categorical_crossentropy is used and if they are sparse then sparse_categorical_crossentropy is used.

If you are doing multi-label classification, meaning each sample may belong to zero, one or more than one classes e.g. an image may contain both cat and dog, then:

  • The labels should be like [[1 0 0 1 ... 0], ..., [0 0 1 0 ... 1]] with shape of (n_samples, n_classes). For example, a label [1 1] means that the corresponding sample belong to both classes (e.g. cat and dog).
  • The activation function used is sigmoid since presumably each class is independent of another class.
  • The loss function used is binary_crossentropy.

Leave a Comment