Reasoning

Adaptive Thinking

2025ActivePublished: 29 May 2026Updated: 29 May 2026Published

Key innovation

Lets a reasoning model decide per-query whether to emit a long chain-of-thought or skip thinking entirely, adapting the reasoning budget to problem difficulty.

How it works

The model learns to classify query difficulty and pick between at least two generation modes: (1) Thinking — emit a long chain of thought before answering (e.g. content between <think>…</think>); (2) NoThinking — answer directly with an empty or near-empty trace. In AdaptThink, RL with a constrained objective rewards picking NoThinking as long as overall accuracy holds, while an importance-sampling strategy balances Thinking and NoThinking samples during on-policy training, enabling cold start and exploration of both modes. In production systems (Claude, GPT-5, Qwen3) the mode choice may be driven by a router, a prompt flag, or the API client itself.

Problem solved

Reasoning models (o1, R1, QwQ) always emit long chains of thought, which inflates inference cost and latency — even on trivial tasks where thinking does not improve accuracy. Adaptive Thinking solves this 'overthinking problem' by letting the model decide when to think at all.

Implementation

Reference implementations

AdaptThink (THU-KEG)

Python · THU-KEG (Tsinghua University Knowledge Engineering Group)

Official

Implementation pitfalls

Under-trained mode selector collapses to NoThinkingHigh

Without a constrained objective the model may learn to always skip thinking, hurting accuracy on hard tasks.

Fix:Constrained RL with an accuracy floor (as in AdaptThink) or importance sampling that forces Thinking exploration.

Cold start — no NoThinking samples in on-policy dataMedium

Reasoning models almost never produce an empty <think>, so on-policy RL never sees NoThinking samples.

Fix:Importance sampling and injecting synthetic NoThinking trajectories during cold start.

Evolution

Original paper · 2025 · arXiv 2505.13417 (EMNLP 2025) · Jiajie Zhang

AdaptThink: Reasoning Models Can Learn When to Think

Jiajie Zhang, Nianyi Lin, Lei Hou, Ling Feng, Juanzi Li

2022

Chain-of-Thought Prompting

Wei et al. show LLMs solve harder problems when emitting intermediate reasoning steps.

CoT (concept)

2024

OpenAI o1 and reasoning models

Inflection point

Reasoning models (o1, later DeepSeek-R1) always emit a long chain of thought, exposing the 'overthinking' problem on easy queries.

2025

AdaptThink (RL-based mode switching)

Inflection point

Zhang et al. formalize adaptive switching between Thinking and NoThinking as an RL algorithm with a constrained objective; 53% response-length reduction with +2.4 pp accuracy on DeepSeek-R1-Distill-Qwen-1.5B.

2025

Survey: Concise and Adaptive Thinking in LRMs

Comprehensive survey of adaptive-thinking methods for efficient reasoning (arXiv:2507.09662).

2025

Product adoption: Claude extended thinking, GPT-5 router, Qwen3 thinking toggle

Adaptive thinking becomes a default inference strategy in commercial reasoning models.