Safetensors

Binary tensor serialization format using a JSON header and raw numeric data, with no executable-code deserialization, eliminating the arbitrary code execution risk inherent in pickle-based formats (e.g., PyTorch .pt/.bin) while supporting memory-mapped I/O and selective tensor loading.

Header Size Field

Locates the JSON header and validates file boundaries before parsing.

First 8 bytes of the safetensors file. Stores the JSON header size as a 64-bit unsigned integer (uint64) in little-endian byte order. Enables immediate location of the JSON header without parsing tensor data.

i/o

[8 bytes]Raw binary: 8 bytes at file offset 0.

out

uint64Size of the JSON header in bytes (N). Maximum enforced at 100 MB.

JSON Header

Stores tensor metadata (names, types, shapes, offsets), enabling selective loading without accessing raw data.

Variable-length UTF-8 JSON section immediately following the header size field. Contains a dictionary mapping tensor names to their dtype (e.g., F16, BF16, F32), shape (array of dimension integers), and data_offsets ([BEGIN, END] relative to the start of the data region). Optional __metadata__ key stores arbitrary string-to-string pairs. Size bounded to 100 MB by MAX_HEADER_SIZE.

Tensor Data Buffer

Stores raw tensor numerical data in a memory-mappable format.

Contiguous block of raw bytes storing all tensor data in C (row-major) order, without compression or padding between tensors. Offsets from the JSON header are relative to the start of this buffer (not the file start). Tensors must be packed before serialization — striding is not supported.

Time

…

n — total file size in bytes. JSON header parsing is O(H), where H is the header size (typically << n). Accessing a single tensor via memory-mapping is O(1) (after header parsing). Loading k tensors is O(k · s_k), where s_k are the sizes of the selected tensors.

Full loading of all tensors (load_file) takes time proportional to the total data size. Selective loading (safe_open + get_tensor) provides O(1) access to individual tensors after a one-time O(H) header parse.

Memory complexity

…

n denotes the total size of all tensors. With memory-mapping (mmap), additional RAM usage is O(1) for lazy loading or O(k) when loading k selected tensors. The JSON header occupies O(T), where T is the number of tensors.

Memory-mapping allows the full file allocation in RAM to be avoided. On CPU, if the file is in the OS cache, near-zero-copy access is possible. On GPU, a copy from RAM to VRAM is always required.

Parallelism

Fully parallel

Tensors in the data buffer are independent — they can be loaded in parallel by multiple threads or processes. JSON header parsing is sequential but takes negligible time compared to data loading. The format supports distributed loading: each node can load a different subset of tensors (tensor parallelism sharding, used in TGI).

Common pitfalls

Shared tensors (memory-sharing tensors) in PyTorch

MEDIUM

PyTorch allows tensors sharing the same memory storage. The safetensors PyTorch adapter includes special logic for detecting and handling shared tensors. Serializing models with shared tensors without this handling may lead to data duplication or errors. After deserialization, memory sharing is lost — each tensor is independent.

Use the official safetensors.torch adapter, which handles shared tensors. Verify model integrity after conversion from .pt to .safetensors by comparing parameters.

No compression – large file sizes for low-entropy models

LOW

Safetensors does not use compression. Tensor data is stored as raw bytes. For models with low entropy (e.g., highly sparse weights or quantized models with many zeros), file size may be significantly larger than with compressed serialization formats.

If file size is critical, consider filesystem-level compression or archives (e.g. .tar.zst). The safetensors format does not support built-in compression per its specification.

Duplicate JSON header keys – inconsistent results across parsers

MEDIUM

JSON specification does not formally define behavior for duplicate keys. The Trail of Bits audit found that the Hugging Face reference implementation rejects files with duplicate keys, but some third-party JSON parsers accept them with undefined behavior. A malicious file may thus behave differently across implementations.

Use only the official safetensors library for parsing. When implementing custom parsers, reject files with duplicate JSON keys during validation.

No built-in data integrity verification (checksum/hash)

LOW

The safetensors format does not include a built-in data integrity mechanism (e.g., SHA-256 hash of tensors). File corruption during transmission or storage may not be detected at load time — the format validates structure and offsets but not data checksums.

Use external integrity mechanisms (e.g., SHA-256 file hashes distributed alongside the model). Hugging Face Hub provides a file hash for every model file.

Reference implementations

safetensors – official repository (Rust + Python bindings)official

Rust, Python · Hugging Face (Nicolas Patry)

safetensors – documentation and API reference (Hugging Face)official

Python · Hugging Face

2022

First public release of safetensors v0.0.1 (September 22, 2022, PyPI)

breakthrough

Nicolas Patry at Hugging Face published the first version of the safetensors library and format specification. Rust core, Python bindings via PyO3, PyTorch and NumPy support. Format designed as a secure and fast alternative to pickle.

2023

Trail of Bits security audit (May 2023) and adoption by Hugging Face Hub as the default format

breakthrough

Independent security audit by Trail of Bits, commissioned by Hugging Face, EleutherAI, and Stability AI. No critical vulnerabilities found. Hugging Face Hub adopted safetensors as preferred format, displaying warnings for pickle-format models.

2025

Integration of safetensors into PyTorch core as a native serialization option

breakthrough

PyTorch merged native safetensors support into its core serialization API (weights_only parameter and safetensors format option in the save API). This marks institutional endorsement by the leading deep learning framework, relegating pickle to legacy status.

Hardware agnosticPRIMARY

Safetensors is a file format, not a computational algorithm. The loading operation (JSON header parsing, memory-mapping) is hardware-agnostic and efficient both on CPU and as a preliminary stage before data transfer to GPU/TPU. The format supports framework-agnostic loading of the same tensors into PyTorch, TensorFlow, JAX, and MLX.

Memory copying to GPU memory (after loading from file) is a standard DMA operation independent of file format. Memory-mapping benefits are most apparent on the CPU side.

Title	Publisher	Type
Safetensors Documentation Official documentation describing Safetensors as a simple, secure, and fast tensor storage format.	Hugging Face	documentation
huggingface/safetensors Official Safetensors project repository.	GitHub	repository
Safetensors Security Audit Description of the format's security properties, supported frameworks, and audit findings.	Hugging Face	blog
Safetensors in Text Generation Inference Documentation for using Safetensors in model serving and tensor parallel sharding.	Hugging Face	documentation
safetensors on PyPI Release history: the first public release, 0.0.1, appeared on September 22, 2022.	PyPI	repository