Rho-alpha (ρα) is Microsoft Research's first robotics model, announced on January 21, 2026. Built on the Phi vision-language family, it targets bimanual manipulation under natural-language control. Microsoft positions it as the first VLA+ model — an extension of the classical Vision-Language-Action architecture with tactile sensing as a third perception modality and online learning from operator corrections after deployment.
What makes it VLA+
- Tactile sensing — the model reasons about how objects feel during manipulation, essential for plug insertion, packing, and assembly with tight tolerances. Microsoft plans to extend this with force sensing.
- Online learning — when a robot fails, an operator can intervene via teleoperation or 3D mouse, and Rho-alpha learns from corrective feedback in real time, even post-deployment.
Training
Hybrid pipeline: physical demonstrations from real robots, large-scale RL simulations generated in NVIDIA Isaac Sim, and web-scale Visual Question Answering data. Simulation is central given the lack of any web-scale tactile interaction corpus.
Demonstrated capabilities
- BusyBox — Microsoft's own physical interaction benchmark, controlled by natural language.
- Plug insertion with tactile feedback and live operator corrections.
- Toolbox packing and object arrangement with bimanual coordination.
Evaluated on dual-arm setups and humanoid robots. A full technical report is announced for the coming months.