gradient-descent - w3toppers.com

Tensorflow: How to write op with gradient in python?

Yes, as mentionned in @Yaroslav’s answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example. Modulo opperation: Let’s implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example … Read more

Neural network always predicts the same class

My network does always predict the same class. What is the problem? I had this a couple of times. Although I’m currently too lazy to go through your code, I think I can give some general hints which might also help others who have the same symptom but probably different underlying problems. Debugging Neural Networks … Read more

Why do we need to call zero_grad() in PyTorch?

In PyTorch, for every mini-batch during the training phase, we typically want to explicitly set the gradients to zero before starting to do backpropragation (i.e., updating the Weights and biases) because PyTorch accumulates the gradients on subsequent backward passes. This accumulating behaviour is convenient while training RNNs or when we want to compute the gradient … Read more

How to interpret caffe log with debug_info?

At first glance you can see this log section divided into two: [Forward] and [Backward]. Recall that neural network training is done via forward-backward propagation: A training example (batch) is fed to the net and a forward pass outputs the current prediction. Based on this prediction a loss is computed. The loss is then derived, … Read more

gradient descent using python and numpy

I think your code is a bit too complicated and it needs more structure, because otherwise you’ll be lost in all equations and operations. In the end this regression boils down to four operations: Calculate the hypothesis h = X * theta Calculate the loss = h – y and maybe the squared cost (loss^2)/2m … Read more

Common causes of nans during training of neural networks

I came across this phenomenon several times. Here are my observations: Gradient blow up Reason: large gradients throw the learning process off-track. What you should expect: Looking at the runtime log, you should look at the loss values per-iteration. You’ll notice that the loss starts to grow significantly from iteration to iteration, eventually the loss … Read more