Robots Atlas>ROBOTS ATLAS
Mamba-2

Mamba-2

2 · Family: Mamba
Second generation of the Mamba architecture (Selective SSM) with the SSD layer, 2–8× faster than Mamba while remaining competitive with Transformers.
🔬 Research🔬 Research only⚖ Open sourceLLM📁 Mamba
Parameters
130M – 2.7B
parameters
Release date
31 May 2024
Access:DownloadDeployment:💻 Local

Overview

Mamba-2 is a language model architecture developed by Tri Dao (Princeton University) and Albert Gu (Carnegie Mellon University), published on 31 May 2024 in the paper "Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality" (ICML 2024, arXiv:2405.21060). Its core is the SSD (Structured State Space Duality) layer, a refinement of the selective SSM from the original Mamba.

Base model weights (130M, 370M, 780M, 1.3B, 2.7B parameters) are published on Hugging Face under the state-spaces organization with the Apache-2.0 license. The models were trained on 300B tokens from The Pile.

Classification
LLM
Family: Mamba
Access & deployment
Download
Local
Weights: Open source
Key parameters
🧩 Parameters: 130M – 2.7B
📥 Input: text

Technical specification

Parameters
130M – 2.7B
parameters
License
Apache-2.0
Modalities
⬇ Input
text
⬆ Output
text

Capabilities and applications

Native model capabilities
Language modeling
Ability to predict subsequent tokens and generate coherent natural-language text based on the preceding context.
Category: language
Long context
The model's ability to handle long context and maintain coherence over a large amount of input data.
Category: reasoning

Technical architecture

Core Architecture
Model Form
Training Techniques