Transformer from Scratch · Decoder-Only Transformer
Language Modeling Head and Logits
Decoder-Only Transformer
Introduction
You will learn the language modeling head that maps token representations to vocabulary logits, and the relationship between logits, softmax, and cross-entropy.