What are the differences between all these cross-entropy losses in Keras and TensorFlow?

There is just one cross (Shannon) entropy defined as: H(P||Q) = – SUM_i P(X=i) log Q(X=i) In machine learning usage, P is the actual (ground truth) distribution, and Q is the predicted distribution. All the functions you listed are just helper functions which accepts different ways to represent P and Q. There are basically 3 … Read more

What is the difference between a sigmoid followed by the cross entropy and sigmoid_cross_entropy_with_logits in TensorFlow?

You’re confusing the cross-entropy for binary and multi-class problems. Multi-class cross-entropy The formula that you use is correct and it directly corresponds to tf.nn.softmax_cross_entropy_with_logits: -tf.reduce_sum(p * tf.log(q), axis=1) p and q are expected to be probability distributions over N classes. In particular, N can be 2, as in the following example: p = tf.placeholder(tf.float32, shape=[None, … Read more

What’s the difference between sparse_softmax_cross_entropy_with_logits and softmax_cross_entropy_with_logits?

Having two different functions is a convenience, as they produce the same result. The difference is simple: For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1]. For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64. Labels … Read more

What is the meaning of the word logits in TensorFlow? [duplicate]

Logits is an overloaded term which can mean many different things: In Math, Logit is a function that maps probabilities ([0, 1]) to R ((-inf, inf)) Probability of 0.5 corresponds to a logit of 0. Negative logit correspond to probabilities less than 0.5, positive to > 0.5. In ML, it can be the vector of … Read more

How to choose cross-entropy loss in TensorFlow?

Preliminary facts In functional sense, the sigmoid is a partial case of the softmax function, when the number of classes equals 2. Both of them do the same operation: transform the logits (see below) to probabilities. In simple binary classification, there’s no big difference between the two, however in case of multinomial classification, sigmoid allows … Read more