Gaussian Error Linear Unit Activates Neural Networks Beyond ReLU
Results of the various experiments show GELU consistently has the best performance compared with ReLU and ELU, and can be considered a viable alternative to previous nonlinear approaches.