Transformer from Scratch · Text Generation
KV Cache: Intuition and Implementation
Text Generation
Introduction
You will understand why caching keys and values speeds up generation, and which shape, mask, and position constraints must be preserved in an implementation.