Transformer from Scratch · Transformer Block
Feed Forward Network
Transformer Block
Introduction
You will understand why a position-wise feed-forward network follows attention, how it expands and contracts the representation dimension, and how to keep it compatible with the residual path.