Neural Networks: From Fundamentals to Modern AI · From Neuron to MLP: Architecture and Forward Pass
Loss functions: MSE and Cross-Entropy — intuition and choice
From Neuron to MLP: Architecture and Forward Pass
Introduction
The loss function is the training "compass" — it tells the network how wrong it is and which direction to update weights. This lesson contrasts two fundamental losses: Mean Squared Error (MSE) for regression and Cross-Entropy (CE) for classification. You will see why MSE+sigmoid yields vanishing gradients and why CE+softmax is the "natural pair", how the numerically stable log-softmax + NLL fusion in PyTorch works, what label smoothing is, how to handle class imbalance (weighted CE, focal loss), and why loss → NaN almost always means a log(0) explosion or a wrong learning rate. We also cover the loss ↔ Maximum Likelihood Estimation relationship and mean vs sum reduction.