Robots Atlas>ROBOTS ATLAS

Transformer from Scratch · Multi-Head Attention

Why Multiple Attention Heads Matter

Multi-Head Attention

Introduction

A single attention head can retrieve context, but multiple heads let the model learn different relationships in parallel. In this lesson you will see why we split the representation into heads.