Aktualności6 maja 2026
Google boosts Gemma 4 inference up to 3x with speculative decoding
On May 6, 2026, Google released experimental Multi-Token Prediction (MTP) drafter models for the Gemma 4 family, accelerating local inference up to three times with no loss of output quality. The technique is based on speculative decoding: a lightweight draft model predicts future tokens, which are then verified in parallel by the main model.