
Xiaomi CyberOne
Xiaomi’s full-size humanoid prototype unveiled on August 11, 2022 — 177 cm tall, 52 kg, 21 degrees of freedom, featuring the MiSense depth-vision module and an environmental voice and emotion recognition engine.
- Research

Perception · Perception & Vision Software
4.0 (LLM integration)·Xiaomi
MiAI Environment Voice & Semantic Recognition Engine is Xiaomi's internal AI stack for Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and multimodal fusion with cameras and depth sensors. The engine was created in 2017 as part of the MiAI platform for voice assistants (XiaoAI) and extended in 2022 for the Xiaomi CyberOne humanoid.
The engine has three layers: an audio front-end (beamforming from a 6-microphone array, AEC, beamforming, and MetricGAN denoising), ASR (a Conformer-CTC end-to-end model for Mandarin and Cantonese, transducer for English), and NLU (slot-filling with a fine-tuned PaLM2 variant). For CyberOne, voice/vision fusion (face direction, gesture) and sound-source localization are added.
The engine runs in a hybrid mode: wakeword detection and basic intents offline (on Xiaomi Surge G1 in CyberOne), full ASR/NLU in the Xiaomi MiAI Cloud. It is also used in Xiaomi Smart Home, Mi Mix Alpha, domestic robots, and CyberDog 2.
A Perception Stack encompasses the software layers that process data from cameras, LiDARs, IMUs, microphones, and other sensors in order to recognise the surrounding environment, perform localisation, detect and track objects, and interpret the scene. It is typically the first processing stage in an autonomous robot's data pipeline, feeding its outputs to planning and control stacks.
A Runtime is the environment or execution layer used to run code, load libraries, manage dependencies, and operate applications or services — either in real time or during normal system operation. In robotics this includes real-time operating system (RTOS) runtimes, ROS 2 executor runtimes, containerised execution environments (Docker, podman), and embedded C++ runtimes on microcontrollers.
An API Library is a software package that exposes programmatic interfaces for communicating with a device, service, or system. In robotics it typically forms a lightweight integration layer built on top of the manufacturer's official API or an open-source project, abstracting low-level protocol details and providing language-native bindings (Python, C++, Java, etc.).
Xiaomi CyberOne, CyberDog 2, XiaoAI assistant (300M+ devices), Xiaomi Smart Home Hub, Xiaomi SU7 (electric car, voice cockpit).
No public community — closed-source stack. Xiaomi AI Lab internal team: 1,500+ engineers.
Wakeword model ~12 MB on-device, full offline ASR ~280 MB. Cloud inference requires a 1+ Mbps connection.
License family: Proprietary – Commercial
Integration of MiLM (Xiaomi LLM 6B/13B) for advanced NLU.
Adapted for the CyberOne humanoid, sound-source localization.
Added voice-vision fusion in Mi Mix Alpha.
First release for the XiaoAI assistant.