Softmax loss has become a standard build-in loss function for a lot of mathematical tools like TensorFlow, Torch and Caffe. It is mainly used for classification and has its advantages and disadvantages, the latter of which is the focus of this paper.