Why use softmax as opposed to standard normalization?

There is one nice attribute of Softmax as compared with standard normalisation.

It react to low stimulation (think blurry image) of your neural net with rather uniform distribution and to high stimulation (ie. large numbers, think crisp image) with probabilities close to 0 and 1.

While standard normalisation does not care as long as the proportion are the same.

Have a look what happens when soft max has 10 times larger input, ie your neural net got a crisp image and a lot of neurones got activated

>>> softmax([1,2])              # blurry image of a ferret
[0.26894142,      0.73105858])  #     it is a cat perhaps !?
>>> softmax([10,20])            # crisp image of a cat
[0.0000453978687, 0.999954602]) #     it is definitely a CAT !

And then compare it with standard normalisation

>>> std_norm([1,2])                      # blurry image of a ferret
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?
>>> std_norm([10,20])                    # crisp image of a cat
[0.3333333333333333, 0.6666666666666666] #     it is a cat perhaps !?

Leave a Comment