Robots Atlas>ROBOTS ATLAS

Transformer from Scratch · Multi-Head Attention

Linear Projections for Q, K and V

Multi-Head Attention

Introduction

Multi-head attention begins with linear projections. In this lesson you will see how one projection can produce Q, K and V for all heads and how to safely split the last tensor dimension.