Neural Networks: From Fundamentals to Modern AI · From Neuron to MLP: Architecture and Forward Pass

Activation functions: sigmoid, ReLU, GELU, tanh — when and why

From Neuron to MLP: Architecture and Forward Pass

Introduction

The choice of activation function is one of the most fundamental design decisions in a network. Sigmoid and tanh dominated the 1990s, ReLU revolutionized deep nets after AlexNet (2012), and GELU has become the standard in transformers (BERT, GPT). This lesson compares the four canonical activations along formulas, derivatives, the vanishing-gradient problem, saturation, dead neurons in ReLU, and modern variants (Leaky ReLU, ELU, Swish, SiLU). You will understand why ReLU won in feedforward networks and GELU in transformers.