Strange behaviour of the loss function in keras model, with pretrained convolutional base

Looks like I found the solution. As I have suggested the problem is with BatchNormalization layers. They make tree things subtract mean and normalize by std collect statistics on mean and std using running average train two additional parameters (two per node). When one sets trainable to False, these two parameters freeze and layer also … Read more