Robots Atlas>ROBOTS ATLAS
Robotics

MSAT

2026ExperimentalPublished
Key innovation
Integrates heterogeneous robotic modalities (vision, language, proprioception, tactile sensing, motor signals) as separate modality-specific token streams inside a single transformer, fused via cross-modal joint self-attention — allowing a VLA policy to jointly learn broad scene understanding together with narrow functional capabilities (motion awareness, long-term memory, physical sensing) without pipeline engineering compromises.
Category
Robotics
Abstraction level
Pattern

Components

Modality-specific streams
Cross-modal joint self-attention
Action head
Modality positional/type encoding

Implementation

Implementation pitfalls
Imbalans strumieni modalnościCritical
Eksplozja długości sekwencjiHigh
Zaszumione strumienie sensorówHigh
Latencja inferencji w czasie rzeczywistymMedium
Technical details

Hyperparameters (configurable axes)

Liczba i typ strumieni modalnościCritical
Tokenizer per modalnośćHigh
Głębokość fuzji międzymodalnejHigh
Horyzont predykcji akcjiMedium

Execution paradigm

Primary mode
dense
Activation pattern
all_paths_active

Parallelism

Parallelism level
partially_parallel
Scope
trainingacross_tokens

Hardware requirements

Primary
Good fit