
Agentic multimodal model by Xiaohongshu (RedNote-hilab) integrating image understanding, web search and code execution within a unified reasoning chain.
๐ฌ Research๐ฌ Research onlyโ Open sourceMultimodalTool-using modelAgent modelVision
Parameters
7B / 32B
parameters
Release date
7 November 2025
Access:DownloadDeployment:๐ป Localโ Cloud
Overview
Classification
MultimodalTool-using modelAgent modelVision
Access & deployment
Download
LocalCloud
Weights: Open source
Key parameters
๐งฉ Parameters: 7B / 32B
โ Toolsย ยทย โ Fine-tuning
๐ฅ Input: text, image
Technical specification
Parameters
7B / 32B
parameters
License
Apache-2.0
Hardware requirements
Training: 32+ GPUs (4 nodes ร 8) for 7B variant; 64+ GPUs (8 nodes ร 8) for 32B. Min. 1200 GB RAM per node due to high-resolution images in V* and ArxivQA datasets.
Features:โ Tool useโ Fine-tuning
Modalities
โฌ Input
textimage
โฌ Output
textcode
Capabilities and applications
Native model capabilities
Multimodal understanding
Category: multimodal
Benchmark results
1 benchmark
RealX-Bench
n/d
๐ DeepEyesV2 paper (arXiv:2511.05271)
Team's own benchmark introduced alongside the model; evaluates integrated perception, search and reasoning on real-world tasks.