Robots Atlas>ROBOTS ATLAS
Gated DeltaNet

Gated DeltaNet

Linear-transformer architecture combining the delta rule with gating, an improvement over Mamba2 and DeltaNet (NVIDIA Research, MIT CSAIL, ICLR 2025).
๐Ÿ”ฌ Research๐Ÿ”ฌ Research onlyLLM
Parameters
0.4B โ€“ 1.3B (skala badawcza)
parameters
Release date
9 December 2024
Access:DownloadDeployment:๐Ÿ’ป Local

Overview

Gated DeltaNet is a sequence architecture from the linear-transformer family, developed by Songlin Yang (MIT CSAIL), Jan Kautz and Ali Hatamizadeh (NVIDIA Research). The paper "Gated Delta Networks: Improving Mamba2 with Delta Rule" (arXiv:2412.06464) was submitted on 9 December 2024 and accepted to ICLR 2025. The model combines gating โ€” enabling rapid memory erasure โ€” with the delta update rule, which allows precise, targeted state modifications.

The authors release only the (PyTorch) code under the NVIDIA Source Code License-NC (non-commercial); pretrained weights are not provided. The architecture has been integrated into the Flash Linear Attention library and adopted in models such as Qwen3-Next and OLMo Hybrid. Experiments in the paper used models at the 0.4B and 1.3B parameter scale trained on FineWeb-Edu and SlimPajama-672B.

Classification
LLM
Access & deployment
Download
Local
Weights: Closed
Key parameters
๐Ÿงฉ Parameters: 0.4B โ€“ 1.3B (skala badawcza)
๐Ÿ“ฅ Input: text

Technical specification

Parameters
0.4B โ€“ 1.3B (skala badawcza)
parameters
License
NVIDIA Source Code License-NC
Modalities
โฌ‡ Input
text
โฌ† Output
text

Capabilities and applications

Native model capabilities
Language modeling
Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.
Category: language
Long context
The model's ability to handle long context and maintain coherence over a large amount of input data.
Category: reasoning

Technical architecture

Model Form
Training Techniques