How to add attention layer to a Bi-LSTM

This can be a possible custom solution with a custom layer that computes attention on the positional/temporal dimension from tensorflow.keras.layers import Layer from tensorflow.keras import backend as K class Attention(Layer): def __init__(self, return_sequences=True): self.return_sequences = return_sequences super(Attention,self).__init__() def build(self, input_shape): self.W=self.add_weight(name=”att_weight”, shape=(input_shape[-1],1), initializer=”normal”) self.b=self.add_weight(name=”att_bias”, shape=(input_shape[1],1), initializer=”zeros”) super(Attention,self).build(input_shape) def call(self, x): e = K.tanh(K.dot(x,self.W)+self.b) a = … Read more

Keras input explanation: input_shape, units, batch_size, dim, etc

Units: The amount of “neurons”, or “cells”, or whatever the layer has inside it. It’s a property of each layer, and yes, it’s related to the output shape (as we will see later). In your picture, except for the input layer, which is conceptually different from other layers, you have: Hidden layer 1: 4 units … Read more

Loss & accuracy – Are these reasonable learning curves?

A little understanding of the actual meanings (and mechanics) of both loss and accuracy will be of much help here (refer also to this answer of mine, although I will reuse some parts)… For the sake of simplicity, I will limit the discussion to the case of binary classification, but the idea is generally applicable; … Read more

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

The reason for this apparent performance discrepancy between categorical & binary cross entropy is what user xtof54 has already reported in his answer below, i.e.: the accuracy computed with the Keras method evaluate is just plain wrong when using binary_crossentropy with more than 2 labels I would like to elaborate more on this, demonstrate the … Read more