Architecture

Luong Attention

2015HistoricalPublished: 28 May 2026Updated: 28 May 2026Published

Key innovation

Simplifying and systematising NMT attention through global and local variants and multiplicative/dot-product scoring instead of additive scoring.

How it works

The global Luong variant computes a score between the current decoder state and each encoder state, normalises scores with softmax, and forms the context vector as a weighted sum of encoder states. The local variant first predicts a central source position and then attends only within a window around that position. The score function can be dot, general or concat.

Problem solved

It reduces cost and simplifies attention construction in seq2seq models while also enabling a local variant that restricts the number of source positions considered at each step.