Gated Linear Units (GLU)
Replaces the standard Transformer FFN layer with a gated product of two linear projections, increasing modeling capacity without adding parameters.
Category
Abstraction level
Replaces the standard Transformer FFN layer with a gated product of two linear projections, increasing modeling capacity without adding parameters.