How to visualize RNN/LSTM gradients in Keras/TensorFlow?

Gradients can be fetched w.r.t. weights or outputs – we’ll be needing latter. Further, for best results, an architecture-specific treatment is desired. Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes. Completeness: code shown is a simplified version – the full version … Read more

ValueError: Tensor must be from the same graph as Tensor with Bidirectinal RNN in Tensorflow

TensorFlow stores all operations on an operational graph. This graph defines what functions output to where, and it links it all together so that it can follow the steps you have set up in the graph to produce your final output. If you try to input a Tensor or operation on one graph into a … Read more

ValueError: Input 0 is incompatible with layer lstm_13: expected ndim=3, found ndim=4

I solved the problem by making input size: (95000,360,1) and output size: (95000,22) and changed the input shape to (360,1) in the code where model is defined: model = Sequential() model.add(LSTM(22, input_shape=(360,1))) model.add(Dense(22, activation=’softmax’)) model.compile(loss=”categorical_crossentropy”, optimizer=”adam”, metrics=[‘accuracy’]) print(model.summary()) model.fit(ml2_train_input, ml2_train_output_enc, epochs=2, batch_size=500)

TimeDistributed(Dense) vs Dense in Keras – Same number of parameters

TimeDistributedDense applies a same dense to every time step during GRU/LSTM Cell unrolling. So the error function will be between predicted label sequence and the actual label sequence. (Which is normally the requirement for sequence to sequence labeling problems). However, with return_sequences=False, Dense layer is applied only once at the last cell. This is normally … Read more

How do I create a variable-length input LSTM in Keras?

I am not clear about the embedding procedure. But still here is a way to implement a variable-length input LSTM. Just do not specify the timespan dimension when building LSTM. import keras.backend as K from keras.layers import LSTM, Input I = Input(shape=(None, 200)) # unknown timespan, fixed feature size lstm = LSTM(20) f = K.function(inputs=[I], … Read more

What’s the difference between “hidden” and “output” in PyTorch LSTM?

I made a diagram. The names follow the PyTorch docs, although I renamed num_layers to w. output comprises all the hidden states in the last layer (“last” depth-wise, not time-wise). (h_n, c_n) comprises the hidden states after the last timestep, t = n, so you could potentially feed them into another LSTM. The batch dimension … Read more

Pytorch – RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed

The problem is from my training loop: it doesn’t detach or repackage the hidden state in between batches? If so, then loss.backward() is trying to back-propagate all the way through to the start of time, which works for the first batch but not for the second because the graph for the first batch has been … Read more