Robots Atlas>ROBOTS ATLAS
DBRX Base

DBRX Base

DBRX Baseย ยทย Family: DBRX
Base pretrained DBRX model without instruction tuning. 132B total parameters, 36B active (MoE 16 experts, top-4). Pretrained on 12T tokens, 32K context window.
โœ“ Activeโœ“ Public accessโš– Open weightsLLM๐Ÿ“ DBRX
Context window
32K
tokens
Parameters
132B total / 36B active
parameters
Max output
32,000
tokens
Release date
27 March 2024
Access:APIDownloadDeployment:โ˜ Cloud๐Ÿ’ป Local

Overview

DBRX Base is the foundation pretrained language model in the DBRX family, released on March 27, 2024 by Databricks under the open Databricks Open Model License. Unlike DBRX Instruct, it has not been instruction-tuned โ€” it serves as a foundation for further fine-tuning or continued pretraining on enterprise data.

Architecture

DBRX Base is a decoder-only Transformer with fine-grained Mixture of Experts (MoE): 132B total parameters, 36B active per token, 16 experts of which 4 are selected per input. It uses Rotary Position Encodings (RoPE), Gated Linear Units (GLU), Grouped Query Attention (GQA), and the GPT-4 tokenizer from the tiktoken repository. Maximum context length is 32,768 tokens.

Training Data and Compute

DBRX Base was pretrained on 12 trillion tokens of carefully curated text and code data, with curriculum learning (mix changes during training). Data processing relied on Apache Spark, Databricks Notebooks, and Unity Catalog. Training ran on 3,072 NVIDIA H100 GPUs connected via 3.2 Tbps InfiniBand, using MegaBlocks, LLM Foundry, Composer, and Streaming. Databricks estimates the new pretraining data is at least 2x better token-for-token than the data used for the MPT family.

Use Cases

DBRX Base is intended primarily for advanced users: ML teams that want to perform their own instruction tuning, RLHF, or continued pretraining on domain-specific data. For typical chat and instruction-following workloads, Databricks recommends DBRX Instruct.

Classification
LLM
Family: DBRX
Access & deployment
APIDownload
CloudLocal
Weights: Open weights
Key parameters
๐Ÿ“ Context: 32K
๐Ÿงฉ Parameters: 132B total / 36B active
โœ“ Fine-tuning
๐Ÿ“ฅ Input: text

Technical specification

Context window
32K
tokens
Parameters
132B total / 36B active
parameters
Max output tokens
32,000
tokens per response
Knowledge cutoff
1 Dec 2023
Knowledge boundary
License
Databricks Open Model License
Hardware requirements
Training: 3,072x NVIDIA H100 connected by 3.2 Tbps InfiniBand. Inference: enterprise-class GPUs (e.g. 8x H100 or A100) with TensorRT-LLM; 8-bit quantization supported.
Features:โœ“ Fine-tuning
Modalities
โฌ‡ Input
text
โฌ† Output
textcode

Capabilities and applications

Native model capabilities
Coding
Generating, analysing and modifying source code.
Category: coding
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Long context
Maintaining coherence and focus across very long input context.
Category: language
Multilingual
Understanding and generating text in many languages.
Category: language

Benchmark results

1 benchmark
MMLU
accuracy ยท 5-shot
73.7%
๐Ÿ“„ Databricks DBRX blog (2024-03-27)
Score from Table 1 in the DBRX blog (DBRX Instruct). DBRX Base scores not separately reported.

Technical architecture

Model Form

Deployment and security

โ˜ Available on platforms