The model learns to classify query difficulty and pick between at least two generation modes: (1) Thinking — emit a long chain of thought before answering (e.g. content between <think>…</think>); (2) NoThinking — answer directly with an empty or near-empty trace. In AdaptThink, RL with a constrained objective rewards picking NoThinking as long as overall accuracy holds, while an importance-sampling strategy balances Thinking and NoThinking samples during on-policy training, enabling cold start and exploration of both modes. In production systems (Claude, GPT-5, Qwen3) the mode choice may be driven by a router, a prompt flag, or the API client itself.
Reasoning models (o1, R1, QwQ) always emit long chains of thought, which inflates inference cost and latency — even on trivial tasks where thinking does not improve accuracy. Adaptive Thinking solves this 'overthinking problem' by letting the model decide when to think at all.
Without a constrained objective the model may learn to always skip thinking, hurting accuracy on hard tasks.
Reasoning models almost never produce an empty <think>, so on-policy RL never sees NoThinking samples.
Wei et al. show LLMs solve harder problems when emitting intermediate reasoning steps.
Reasoning models (o1, later DeepSeek-R1) always emit a long chain of thought, exposing the 'overthinking' problem on easy queries.
Zhang et al. formalize adaptive switching between Thinking and NoThinking as an RL algorithm with a constrained objective; 53% response-length reduction with +2.4 pp accuracy on DeepSeek-R1-Distill-Qwen-1.5B.
Comprehensive survey of adaptive-thinking methods for efficient reasoning (arXiv:2507.09662).
Adaptive thinking becomes a default inference strategy in commercial reasoning models.
The mode decision is made per query, not globally.
The model emits a mode-selection token or signal (e.g. empty <think></think> for NoThinking) based on its own assessment of query difficulty.
Generation itself is sequential (autoregressive), but skipping the long chain-of-thought in NoThinking yields a much shorter sequence and higher batch throughput.