How to add an attention mechanism in keras?
If you want to have an attention along the time dimension, then this part of your code seems correct to me: activations = LSTM(units, return_sequences=True)(embedded) # compute importance for each step attention = Dense(1, activation=’tanh’)(activations) attention = Flatten()(attention) attention = Activation(‘softmax’)(attention) attention = RepeatVector(units)(attention) attention = Permute([2, 1])(attention) sent_representation = merge([activations, attention], mode=”mul”) You’ve worked … Read more