Transformer from Scratch · Training a Language Model
Cross-Entropy Loss for Next-Token Prediction
Training a Language Model
Introduction
You will see how model logits and next-token targets produce cross-entropy loss and how to prepare tensor shapes correctly in PyTorch.