Gated DeltaNet

Linear-transformer architecture combining the delta rule with gating, an improvement over Mamba2 and DeltaNet (NVIDIA Research, MIT CSAIL, ICLR 2025).

🔬 Research🔬 Research onlyLLM

Parameters

0.4B – 1.3B (skala badawcza)

parameters

Release date

9 December 2024

🔬NVIDIAResearch lab 🔬MIT CSAILResearch lab

Access:DownloadDeployment:💻 Local

Overview

Gated DeltaNet is a sequence architecture from the linear-transformer family, developed by Songlin Yang (MIT CSAIL), Jan Kautz and Ali Hatamizadeh (NVIDIA Research). The paper "Gated Delta Networks: Improving Mamba2 with Delta Rule" (arXiv:2412.06464) was submitted on 9 December 2024 and accepted to ICLR 2025. The model combines gating — enabling rapid memory erasure — with the delta update rule, which allows precise, targeted state modifications.

The authors release only the (PyTorch) code under the NVIDIA Source Code License-NC (non-commercial); pretrained weights are not provided. The architecture has been integrated into the Flash Linear Attention library and adopted in models such as Qwen3-Next and OLMo Hybrid. Experiments in the paper used models at the 0.4B and 1.3B parameter scale trained on FineWeb-Edu and SlimPajama-672B.

Classification

LLM

Access & deployment

Download

Local

Weights: Closed

Key parameters

🧩 Parameters: 0.4B – 1.3B (skala badawcza)

📥 Input: text

Technical specification

Parameters

0.4B – 1.3B (skala badawcza)

parameters

License

NVIDIA Source Code License-NC

Modalities

⬇ Input

text

⬆ Output

text

Capabilities and applications

Native model capabilities

Language modeling

Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.

Category: language

Long context

Support for large context windows — tens to hundreds of thousands (or millions) of input tokens. Enables analysis of entire codebases, long documents, and many parallel conversations without losing earlier information. GPT-5.1 supports 400,000 tokens.

Category: language

Technical architecture

Model Form

LLLLM

Training Techniques

PRPretraining

Sources and related pages

3 sources

PaperGated Delta Networks: Improving Mamba2 with Delta Rule (ICLR 2025)arxiv.org RepoNVlabs/GatedDeltaNet (GitHub)github.com PaperOpenReview – ICLR 2025openreview.net

Browse related topics

🧠 LLM All llm models