Neural Networks: From Fundamentals to Modern AI · From Neuron to MLP: Architecture and Forward Pass
Activation functions: sigmoid, ReLU, GELU, tanh — when and why
From Neuron to MLP: Architecture and Forward Pass
Introduction
The choice of activation function is one of the most fundamental design decisions in a network. Sigmoid and tanh dominated the 1990s, ReLU revolutionized deep nets after AlexNet (2012), and GELU has become the standard in transformers (BERT, GPT). This lesson compares the four canonical activations along formulas, derivatives, the vanishing-gradient problem, saturation, dead neurons in ReLU, and modern variants (Leaky ReLU, ELU, Swish, SiLU). You will understand why ReLU won in feedforward networks and GELU in transformers.