Control · Control & Planning

OpenVLA

OpenVLA-mini (4 GB)·Stanford University

Active Open source API available

CATEGORYControl · Control & Planning

READINESSTRL 6

ADOPTION SCALEResearch / Prototype

LICENSESMIT

FIRST RELEASE2024

**OpenVLA** is the first fully open-source replication of the RT-2 architecture, announced in June 2024 (paper 'OpenVLA: An Open-Source Vision-Language-Action Model', Kim et al., arXiv:2406.09246). It was developed jointly by Stanford AI Lab, UC Berkeley (Robot Learning Lab), Google DeepMind, Toyota Research Institute, MIT and Physical Intelligence. OpenVLA fills the gap left by the closed RT-2 — releasing the **model weights**, **training code**, **fine-tuning recipes**, and **complete data pipeline**.

Architecture: ~**7B parameters** built from three components. (1) **Vision encoder** — a fusion of DINOv2 (semantic features) + SigLIP (CLIP-style alignment), both ViT-L/14. (2) **LLM backbone** — Llama 2 7B. (3) **Action head** — action discretization into 256 bins per dimension (as in RT-2), next-token prediction over action tokens.

Training data: ~970,000 demonstrations from **Open X-Embodiment** (Google DeepMind, 21 institutions), covering 22 robots (Franka, UR5, WidowX, Sawyer, Google Robot etc.) and ~500 tasks. Training time: 8 days on 64× A100 80 GB.

Results: OpenVLA achieves **+16.5 pp success rate** over RT-2-X (55B) on out-of-distribution generalization tasks — despite having 8× fewer parameters. Fine-tuning on custom datasets (LoRA-style) takes 10-20 hours on 1× A100 and adapts the model to a new robot with 100-500 demonstrations.

Ecosystem: full integration with **HuggingFace Transformers** (`openvla/openvla-7b`), support for 4-bit quantization (bitsandbytes), compatibility with PyTorch 2.0+. Impact: OpenVLA has become the **de facto VLA baseline** in academia — the basis of all subsequent works (CogACT, TraceVLA, RoboFlamingo). Reproducibility: full checkpoints, dataset indices, and training scripts.

Documentation

Type & Roles

Software types

VLA / Foundation ModelRuntimeSDK

Runtime

A Runtime is the environment or execution layer used to run code, load libraries, manage dependencies, and operate applications or services — either in real time or during normal system operation. In robotics this includes real-time operating system (RTOS) runtimes, ROS 2 executor runtimes, containerised execution environments (Docker, podman), and embedded C++ runtimes on microcontrollers.

SDK

An SDK (Software Development Kit) is a curated set of libraries, interfaces, tools, sample code, and documentation intended for building applications and integrating with a specific hardware device, platform, or service. In robotics, an SDK typically exposes device control, telemetry, sensor access, configuration, and execution functions, significantly reducing the time-to-first-integration for developers targeting a specific robot or platform.

Select an item to see its description.

Main category

Control & PlanningPerception & Vision SoftwareRuntime & InfrastructureSDKs

Roles in robotics ecosystem

Robot ControlMotion PlanningPerceptionComputer VisionDeveloper Enablement

Software family

Family

A family of open Vision-Language-Action (VLA) and foundation models for robotics: OpenVLA (Stanford/Berkeley), LeRobot (Hugging Face), RoboAgent (CMU), RT-2 (Google DeepMind, publication). Trained on datasets such as Open X-Embodiment, BridgeData V2, and RoboNet.

Maturity & Adoption

6 / 9

Demonstration phase

ResearchPrototypeProduction

Adoption scaleResearch / Prototype

Maintenance statusActively Maintained

First release2024

Last update20 May 2026

Deployments

The de facto VLA baseline for academic teams since H2 2024 — used in 150+ scientific publications (Google Scholar, Q1 2026). Fine-tuning experiments: TRI (autonomous driving demonstrations), Stanford (Tidybot mobile manipulation), Berkeley (BridgeData V2). Commercial fine-tunes: Skild AI, Covariant (closed). HuggingFace Spaces demo with a teleop interface.

Community

github.com/openvla/openvla ~2.9k★, ~310 forks. HuggingFace `openvla/openvla-7b` ~50k downloads/month. The arXiv:2406.09246 paper has ~450 citations (Q1 2026). 'Open Robotics Foundation Models' Discord ~1.5k members. Active PRs with fine-tunes for specific domains.

Organizations

Stanford University

Producer · Primary

University of California, Berkeley

Integrates with

Photorealistic robotics simulator (RTX) with advanced PhysX 5 physics. Built on Omniverse Kit, supports ROS 2, synthetic data generation (SDG), Isaac Lab training, and the Isaac ROS deployment pipeline on Jetson.

→

MuJoCo (Multi-Joint dynamics with Contact)

Open-source rigid-body physics engine with accurate contact and friction, by Emo Todorov. Apache 2.0 since 2021, maintained by DeepMind. Standard for learning-based robotics (RL) and sim-to-real.

→

ROS 2

Open-source framework for building robot software. The successor to ROS 1, built on DDS with native support for distributed, real-time and multi-platform systems. The de facto standard in research and commercial robotics.

→

LeRobot

LeRobot (Hugging Face) — open-source robot learning framework with ACT, Diffusion Policy and TDMPC implementations. Standardises data collection and policy training for manipulators and mobile robots.

→

MoveIt 2

Open-source motion planning, manipulation, and kinematics framework for ROS 2 (Foxy → Jazzy). Stewarded by PickNik Robotics. The de facto standard for manipulators in the ROS ecosystem.

→

Related robotics software

Π(

π0 (pi-zero)

Physical Intelligence's first 'generalist robot policy' — VLA with flow matching, 50 Hz actions, trained on 10,000+ hours of demonstrations. Open weights π0-base (February 2025, Apache 2.0).

→

Π(

π0.5 (pi-zero-5)

π0.5 (pi-zero-5, Physical Intelligence) — evolution of π0 focused on open-world mobile manipulation: generalises to new environments and tasks without additional fine-tuning thanks to large-scale training data.

→

RT-2 (Robotics Transformer 2)

A Google DeepMind Vision-Language-Action model based on PaLI-X / PaLM-E. Translates images + language into robot action tokens. The first 'language to action' at real-robot scale (2023).

NVIDIA GR00T N1 — open foundational model for humanoids: dual-system VLA (fast motor policy + slow semantic reasoning), trained on cross-embodiment data, available through NVIDIA Isaac.

→

Supported robot models

Unitree G1

Bipedal humanoid robot by Unitree Robotics, designed as a compact research, development, and developer platform.

Applications

Research
Home Assistance

Unitree H1

Unitree H1 is a full-size general-purpose humanoid robot (~180 cm, ~47 kg). Bipedal, 5 DOF per leg + 4 DOF per arm, 3.3 m/s walking speed, 360° perception via 3D LiDAR + depth camera, Unitree M107 PMSM joint motors with ~360 N·m peak knee torque. Standard compute: Intel Core i5/i7; optional NVIDIA Jetson Orin NX.

Applications

Research

Figure 03

Figure 03 is the third-generation humanoid robot from Figure AI, designed for Helix, home environments, and scalable mass production.

Applications

Factory Automation
Industrial Logistics
Production Line Operation
Warehouse Automation
Object Manipulation
Home Assistance

Atlas

Boston Dynamics bipedal humanoid robot. The fully electric generation unveiled in 2024 succeeds the hydraulic Atlas that was retired after more than a decade of research.

Applications

Factory Automation
Research
Production Line Operation
Object Manipulation

Target robotic platforms

Robotic Arm

Mobile Robot

Service Robot

Research Robot

Supported hardware

NVIDIA Jetson AGX Orin 64GB

compute · compute_modules · industrial

NVIDIA Jetson AGX Thor

compute · compute_modules · industrial

Intel RealSense D435i

sensing · cameras · research · Stereo RGB-D camera

Intel RealSense D455

sensing · cameras · professional

Stereolabs ZED 2i

sensing · cameras · industrial

ROS supportCompatibility with ROS / ROS 2 ecosystem

Community ROS 2 WrapperWrapper ROS 2 tworzony i utrzymywany przez społeczność, nie przez producenta

System capabilities

⊙

Open source

Source code is publicly available under an open-source license — enables security audits, custom modifications, and integration without licensing barriers.

✓

⚡

Real-time capable

Designed with timing-determinism guarantees — meets the requirements of control loops, safety systems, and tasks demanding low, predictable latency.

⟨/⟩

API available

The software exposes a programmable interface (REST, gRPC, SDK, or language bindings) that enables automation and integration with other systems.

✓

📦

Pre-built / binary

Distributed as ready-to-use binary packages, container images, or installers — no need to build from source.

✓

Programming languages

PythonCUDA

Operating systems

Ubuntu 22.04Ubuntu 20.04Ubuntu 24.04Debian

Ubuntu 24.04

Ubuntu 24.04 LTS 'Noble Numbat' — supported until April 2029. The host for ROS 2 Jazzy.

Select an item to see its description.

Minimum hardware requirements

CPUInference: 1× x86-64 CPU ≥ 3 GHz. Training: 64× A100 80 GB (8 days). LoRA fine-tuning: 1× A100 (10-20 hours).

RAM (GB)64

GPUInference: 1× A100 40 GB or H100 (full precision). 4-bit quantization (bitsandbytes) runs on 1× RTX 4090 24 GB. Fine-tuning (LoRA): 1× A100 80 GB.

Disk (GB)100

Open weights `openvla/openvla-7b` on HuggingFace. Code on GitHub `openvla/openvla` (MIT License). Llama 2 license for the backbone — requires acceptance of the Meta AI license for full commercial use.

Packaging & distribution

Package managers

pip / PyPIGitHub Releases / GitHub Actions ArtifactsDocker / Docker Hub

CPU architectures

x86_64 (AMD64)NVIDIA GPU (CUDA – x86_64)ARM64 / AArch64NVIDIA Jetson – AArch64 (JetPack)Apple Silicon – AArch64 (macOS)

Installation difficulty

LevelAdvanced

Protocols and interfaces

Communication protocols

gRPCREST API (HTTP/HTTPS)WebSocketROS 2 TopicsShared Memory (POSIX / mmap)

Hardware interfaces

Ethernet 1000BASE-T (Gigabit Ethernet)Ethernet 10GBASE-T (10 Gigabit Ethernet)USB 3.0 / 3.1 Gen 1PCIe 4.0MIPI CSI-2

Latency classes

Soft Real-Time (100–500 ms)Near Real-Time (500 ms – 2 s)

Deployment types

Local WorkstationCloudEdgeContainerized

Supported simulators

NVIDIA Isaac Sim

MuJoCo

PyBullet / Bullet3

Licenses

MITMIT License

License family: Permissive

ModificationDistributionCommercial useSublicensingPrivate useROS-compatibleOSI approvedFSF Free/LibreRequires attribution

Official text ↗SPDX ↗OSI ↗

Version history

OpenVLA-mini (4 GB)Mar 2025

Small 3B-parameter variant for edge inference on Jetson AGX Orin (~150 ms per action).

OpenVLA + Mobile AlohaJan 2025

OpenVLA integration with the Mobile Aloha platform (Stanford), first dual-arm demonstration on a bipedal robot.

CogACT (na bazie OpenVLA)Nov 2024

CMU + OpenVLA — a variant with a diffusion-based action head instead of discrete tokens. SOTA results in long-horizon manipulation.

OpenVLA-OFTAug 2024

The 'OFT' variant — Optimal Fine-Tuning recipe (LoRA-based) with better performance on few-shot tasks.

OpenVLA paper + checkpoint v1Jun 2024

First public release — arXiv:2406.09246 paper + the `openvla/openvla-7b` checkpoint on HuggingFace.