Robots AtlasRobots Atlas

Grouped Query Attention (GQA)

Resolves the trade-off between Multi-Head Attention (quality) and Multi-Query Attention (inference speed) by grouping Q heads to share K and V within each group, reducing KV-cache memory without significant quality loss.

Category
Abstraction level