Robots Atlas>ROBOTS ATLAS
Artificial Intelligence

Meituan open-sources LongCat-2.0: 1.6T MoE on Chinese ASICs beats GPT-5.5

Meituan open-sources LongCat-2.0: 1.6T MoE on Chinese ASICs beats GPT-5.5

Meituan unveiled LongCat-2.0, a Mixture-of-Experts model with 1.6 trillion total parameters trained entirely on Chinese ASIC clusters, and released it under the MIT license. The model spent the past two months leading the OpenRouter platform under the anonymous alias "Owl Alpha" before the company revealed its identity. On SWE-bench Pro — which measures autonomous resolution of real-world software engineering tasks — LongCat-2.0 scored 59.5 points, edging out OpenAI's GPT-5.5 at 58.6.

Key takeaways

  • 1.6 trillion parameters (MoE), ~48B active per token — context window: 1 million tokens
  • Trained on a cluster of 50,000+ domestic Chinese ASICs, without Nvidia GPUs
  • SWE-bench Pro: 59.5 pts (LongCat-2.0) vs. 58.6 pts (GPT-5.5)
  • As anonymous "Owl Alpha" on OpenRouter: ~559B tokens/day, 242% month-over-month growth
  • MIT license — model weights coming soon to GitHub and Hugging Face

Architecture: sparse attention and idle compute elimination

The core of LongCat-2.0 is aggressive MoE optimization: 1.6 trillion total parameters with only 33–56B active per token (48B on average). Meituan calls this "Zero-Compute Experts" — routine queries route through lighter subnetworks, eliminating the idle compute overhead typical of dense models.

Sustaining a functional 1-million-token context window without memory bottlenecks is the job of LongCat Sparse Attention (LSA). LSA resolves the quadratic cost of attention scoring through three independent mechanisms: Streaming-aware Indexing (sequential HBM-aligned data reads replacing fragmented random access), Cross-Layer Indexing (one indexing pass covering multiple consecutive layers), and Hierarchical Indexing (two-stage coarse-to-fine scoring). An integrated N-gram Embedding module extends the embedding space by ~135B parameters in 5-gram dimensions, orthogonal to the MoE expert layout.

Post-training: three isolated expert clusters

Instead of a single unified RLHF signal, Meituan used MOPD (Multi-teacher Optimization via Mixture of Specialized Experts). Post-training runs in three isolated clusters: Agent Experts (precise tool invocation, multi-turn API parsing, self-correction loops), Reasoning Experts (chain-of-thought, math, multi-hop logic), and Interaction Experts (instruction following, factual grounding, safety guardrails). At inference time, gate-routing merges these clusters without cross-cluster degradation.

On SWE-bench Pro the model scores 59.5 — marginally above GPT-5.5's 58.6. Terminal-Bench 2.1: 70.8 pts. SWE-bench Multilingual: 77.3 pts. FORTE (corporate workflow simulator): 73.2 pts. On broader agentic benchmarks like BrowseComp the model trails Claude Opus 4.8, but within the narrow domain of software engineering it is competitive with the closed-source frontier.

Commercial model and infrastructure

Model weights are expected to be published "soon" on GitHub and Hugging Face Hub — pages are already live with a "coming soon" notice. Meituan offers two billing tracks: standard pay-as-you-go ($0.75/$2.95 per million input/output tokens) and Token Packs purchased upfront for 30-day windows, sold in four timed flash sales per day (Beijing time). A key differentiator: context cache hits are processed at zero charge, significantly reducing the cost of iterative work on large codebases.

Training without Nvidia

Training a model of this scale entirely on domestic ASICs is a structural signal for the industry. Until now, frontier-scale training required large clusters of NVIDIA GPUs (H100/H200/B200). Meituan demonstrated that 50,000 Chinese ASICs are sufficient to build a model reaching results close to closed-source leaders — under conditions where Nvidia hardware access is restricted by US export controls.

Why it matters

LongCat-2.0 cuts across several important axes simultaneously. First, a 1.6T MoE trained without Nvidia GPUs challenges the assumption that Chinese firms are inevitably dependent on Western compute infrastructure. If Meituan can train a near-frontier model on domestic ASICs, the question becomes how close other Chinese labs are to similar independence.

Second, the timing is deliberate: OpenAI restricted GPT-5.6 access at US government request, and Anthropic previously took Mythos 5 offline entirely. Into that gap, LongCat-2.0 arrives as an open-source, MIT-licensed alternative, available globally without export restrictions. Organizations seeking a high-performance model for autonomous coding now have an option outside closed APIs.

Third, SWE-bench Pro results show that the gap between closed-source models and open-weight alternatives has narrowed to less than one percentage point in the domain that matters most to developers.

What's next

  • Full LongCat-2.0 weights are expected on GitHub and Hugging Face — no date given, but repository pages are already live with a "coming soon" notice
  • Meituan signaled continuation of the LongCat line on domestic ASICs — the success of 2.0 increases pressure on other Chinese labs to pursue a similar path
  • US regulators monitoring how GPU export bans affect Chinese AI capabilities will find LongCat-2.0 on both sides of that policy debate

Sources

Share this article