
Sharp's VLA robotic model designed for contact-rich, bimanual manipulation tasks using vision, language, force, and touch.
Context window
nieujawnione publicznie
tokens
Parameters
nieujawnione publicznie; backbone obejmuje SigLIP So400m/14, PaliGemma (Gemma-3B) oraz action expert Gemma-300M
parameters
Release date
9 March 2026
Overview
Key parameters
📏 Context: nieujawnione publicznie
🧩 Parameters: nieujawnione publicznie; backbone obejmuje SigLIP So400m/14, PaliGemma (Gemma-3B) oraz action expert Gemma-300M
📥 Input: text, robot_vision, robot sensors, robot state data
Technical specification
Context window
nieujawnione publicznie
tokens
Parameters
nieujawnione publicznie; backbone obejmuje SigLIP So400m/14, PaliGemma (Gemma-3B) oraz action expert Gemma-300M
parameters
License
CC BY 4.0 for paper; model/license for weights not publicly disclosed
Hardware requirements
Requires an advanced robotic platform with RGB cameras, proprioception, torque/force sensing, and tactile sensors on the hands; demonstrated on the Sharpa North platform with two Sharpa Wave hands.
Modalities
⬇ Input
textrobot_visionrobot_sensorsrobot_state_data
⬆ Output
robot_actionsrobot_commandsmanipulator_controlmotion_trajectories
Capabilities and applications
Native model capabilities
Reasoning
The model's ability to reason logically and solve complex problems.
Category: reasoning
Planning
Forming and executing action plans for complex tasks.
Category: planning
Image understanding
Analysing and interpreting the content of images.
Category: vision
Multimodal understanding
Category: multimodal