Rho-alpha

alpha (ρα)

Microsoft Research's first robotics model, built on the Phi family. Positioned as VLA+ — extends classical Vision-Language-Action with tactile sensing and online learning from human corrections.

🔬 Research🔬 Research onlyVision-Language-Action modelRobotics foundation model

Release date

21 January 2026

🏢MicrosoftProducer

Overview

Rho-alpha (ρα) is Microsoft Research's first robotics model, announced on January 21, 2026. Built on the Phi vision-language family, it targets bimanual manipulation under natural-language control. Microsoft positions it as the first VLA+ model — an extension of the classical Vision-Language-Action architecture with tactile sensing as a third perception modality and online learning from operator corrections after deployment.

What makes it VLA+

Tactile sensing — the model reasons about how objects feel during manipulation, essential for plug insertion, packing, and assembly with tight tolerances. Microsoft plans to extend this with force sensing.
Online learning — when a robot fails, an operator can intervene via teleoperation or 3D mouse, and Rho-alpha learns from corrective feedback in real time, even post-deployment.

Training

Hybrid pipeline: physical demonstrations from real robots, large-scale RL simulations generated in NVIDIA Isaac Sim, and web-scale Visual Question Answering data. Simulation is central given the lack of any web-scale tactile interaction corpus.

Demonstrated capabilities

BusyBox — Microsoft's own physical interaction benchmark, controlled by natural language.
Plug insertion with tactile feedback and live operator corrections.
Toolbox packing and object arrangement with bimanual coordination.

Evaluated on dual-arm setups and humanoid robots. A full technical report is announced for the coming months.

Classification

Vision-Language-Action modelRobotics foundation model

Access & deployment

Weights: Closed

Key parameters

📥 Input: text, image, robot sensors, robot state data