Robots Atlas>ROBOTS ATLAS
OpenVLA

Control · Control & Planning

OpenVLA

OpenVLA-mini (4 GB)·Stanford University

Active Open source API available
CATEGORYControl · Control & Planning
READINESSTRL 6
ADOPTION SCALEResearch / Prototype
LICENSESMIT
FIRST RELEASE2024

**OpenVLA** is the first fully open-source replication of the RT-2 architecture, announced in June 2024 (paper 'OpenVLA: An Open-Source Vision-Language-Action Model', Kim et al., arXiv:2406.09246). It was developed jointly by Stanford AI Lab, UC Berkeley (Robot Learning Lab), Google DeepMind, Toyota Research Institute, MIT and Physical Intelligence. OpenVLA fills the gap left by the closed RT-2 — releasing the **model weights**, **training code**, **fine-tuning recipes**, and **complete data pipeline**.

Architecture: ~**7B parameters** built from three components. (1) **Vision encoder** — a fusion of DINOv2 (semantic features) + SigLIP (CLIP-style alignment), both ViT-L/14. (2) **LLM backbone** — Llama 2 7B. (3) **Action head** — action discretization into 256 bins per dimension (as in RT-2), next-token prediction over action tokens.

Training data: ~970,000 demonstrations from **Open X-Embodiment** (Google DeepMind, 21 institutions), covering 22 robots (Franka, UR5, WidowX, Sawyer, Google Robot etc.) and ~500 tasks. Training time: 8 days on 64× A100 80 GB.

Results: OpenVLA achieves **+16.5 pp success rate** over RT-2-X (55B) on out-of-distribution generalization tasks — despite having 8× fewer parameters. Fine-tuning on custom datasets (LoRA-style) takes 10-20 hours on 1× A100 and adapts the model to a new robot with 100-500 demonstrations.

Ecosystem: full integration with **HuggingFace Transformers** (`openvla/openvla-7b`), support for 4-bit quantization (bitsandbytes), compatibility with PyTorch 2.0+. Impact: OpenVLA has become the **de facto VLA baseline** in academia — the basis of all subsequent works (CogACT, TraceVLA, RoboFlamingo). Reproducibility: full checkpoints, dataset indices, and training scripts.

Type & Roles
Software types
VLA / Foundation Model
Runtime

A Runtime is the environment or execution layer used to run code, load libraries, manage dependencies, and operate applications or services — either in real time or during normal system operation. In robotics this includes real-time operating system (RTOS) runtimes, ROS 2 executor runtimes, containerised execution environments (Docker, podman), and embedded C++ runtimes on microcontrollers.

SDK

An SDK (Software Development Kit) is a curated set of libraries, interfaces, tools, sample code, and documentation intended for building applications and integrating with a specific hardware device, platform, or service. In robotics, an SDK typically exposes device control, telemetry, sensor access, configuration, and execution functions, significantly reducing the time-to-first-integration for developers targeting a specific robot or platform.

Select an item to see its description.
Main category
Control & PlanningPerception & Vision SoftwareRuntime & InfrastructureSDKs
Roles in robotics ecosystem
Motion PlanningPerceptionComputer VisionDeveloper Enablement
Robot Control

Robot Control denotes the role of software responsible for motion control, command execution, coordination of actuating elements and the direct operational logic of the robot.

Select an item to see its description.
Software family
Family

A family of open Vision-Language-Action (VLA) and foundation models for robotics: OpenVLA (Stanford/Berkeley), LeRobot (Hugging Face), RoboAgent (CMU), RT-2 (Google DeepMind, publication). Trained on datasets such as Open X-Embodiment, BridgeData V2, and RoboNet.

Maturity & Adoption
6 / 9
Demonstration phase
ResearchPrototypeProduction
Adoption scaleResearch / Prototype
Maintenance statusActively Maintained
First release2024
Last update20 May 2026
Deployments

The de facto VLA baseline for academic teams since H2 2024 — used in 150+ scientific publications (Google Scholar, Q1 2026). Fine-tuning experiments: TRI (autonomous driving demonstrations), Stanford (Tidybot mobile manipulation), Berkeley (BridgeData V2). Commercial fine-tunes: Skild AI, Covariant (closed). HuggingFace Spaces demo with a teleop interface.

Community

github.com/openvla/openvla ~2.9k★, ~310 forks. HuggingFace `openvla/openvla-7b` ~50k downloads/month. The arXiv:2406.09246 paper has ~450 citations (Q1 2026). 'Open Robotics Foundation Models' Discord ~1.5k members. Active PRs with fine-tunes for specific domains.

ROS supportCompatibility with ROS / ROS 2 ecosystem
Community ROS 2 WrapperWrapper ROS 2 tworzony i utrzymywany przez społeczność, nie przez producenta
System capabilities
Open source
Source code is publicly available under an open-source license — enables security audits, custom modifications, and integration without licensing barriers.
Real-time capable
Designed with timing-determinism guarantees — meets the requirements of control loops, safety systems, and tasks demanding low, predictable latency.
×
⟨/⟩
API available
The software exposes a programmable interface (REST, gRPC, SDK, or language bindings) that enables automation and integration with other systems.
📦
Pre-built / binary
Distributed as ready-to-use binary packages, container images, or installers — no need to build from source.
Programming languages
PythonCUDA
Operating systems
Ubuntu 22.04Ubuntu 20.04Debian
Ubuntu 24.04

Ubuntu 24.04 LTS 'Noble Numbat' — supported until April 2029. The host for ROS 2 Jazzy.

Select an item to see its description.
Minimum hardware requirements
Minimum hardware requirements
CPUInference: 1× x86-64 CPU ≥ 3 GHz. Training: 64× A100 80 GB (8 days). LoRA fine-tuning: 1× A100 (10-20 hours).
RAM (GB)64
GPUInference: 1× A100 40 GB or H100 (full precision). 4-bit quantization (bitsandbytes) runs on 1× RTX 4090 24 GB. Fine-tuning (LoRA): 1× A100 80 GB.
Disk (GB)100

Open weights `openvla/openvla-7b` on HuggingFace. Code on GitHub `openvla/openvla` (MIT License). Llama 2 license for the backbone — requires acceptance of the Meta AI license for full commercial use.

Packaging & distribution
Package managers
pip / PyPIGitHub Releases / GitHub Actions ArtifactsDocker / Docker Hub
CPU architectures
x86_64 (AMD64)NVIDIA GPU (CUDA – x86_64)ARM64 / AArch64NVIDIA Jetson – AArch64 (JetPack)Apple Silicon – AArch64 (macOS)
Installation difficulty
LevelAdvanced
Protocols and interfaces
Communication protocols
gRPCREST API (HTTP/HTTPS)WebSocketROS 2 TopicsShared Memory (POSIX / mmap)
Hardware interfaces
Ethernet 1000BASE-T (Gigabit Ethernet)Ethernet 10GBASE-T (10 Gigabit Ethernet)USB 3.0 / 3.1 Gen 1PCIe 4.0MIPI CSI-2
Latency classes
Soft Real-Time (100–500 ms)Near Real-Time (500 ms – 2 s)
Deployment types
Local WorkstationCloudEdgeContainerized
Supported simulators
NVIDIA Isaac Sim
MuJoCo
PyBullet / Bullet3
Licenses
MITMIT License

License family: Permissive

ModificationDistributionCommercial useSublicensingPrivate useROS-compatibleOSI approvedFSF Free/LibreRequires attribution
Version history
OpenVLA-mini (4 GB)Mar 2025

Small 3B-parameter variant for edge inference on Jetson AGX Orin (~150 ms per action).

OpenVLA + Mobile AlohaJan 2025

OpenVLA integration with the Mobile Aloha platform (Stanford), first dual-arm demonstration on a bipedal robot.

CogACT (na bazie OpenVLA)Nov 2024

CMU + OpenVLA — a variant with a diffusion-based action head instead of discrete tokens. SOTA results in long-horizon manipulation.

OpenVLA-OFTAug 2024

The 'OFT' variant — Optimal Fine-Tuning recipe (LoRA-based) with better performance on few-shot tasks.

OpenVLA paper + checkpoint v1Jun 2024

First public release — arXiv:2406.09246 paper + the `openvla/openvla-7b` checkpoint on HuggingFace.