Transformer from Scratch · Decoder-Only Transformer
Mini-GPT Architecture
Decoder-Only Transformer
Introduction
You will assemble the high-level mini-GPT architecture: token and position embeddings, a decoder-only block stack, final normalization, and the language modeling head.