Reasoning

CMC (Confidence Momentum Controller)

2026Published: 30 May 2026

Key innovation

Momentum-based stopping gate: instead of thresholding the instantaneous majority confidence, the controller maintains an EMA of pool confidence and stops only when the EMA level is high AND its trend (EMA delta) is non-decreasing — eliminating spurious early stops on single-round confidence spikes. The same trend signal also couples width and depth control decisions.

How it works

Inference proceeds in rounds with an outer cap _MAX_OUTER. Phase 0: the controller opens n_init parallel reasoning branches. In each round it (1) computes pool statistics over completed answers (winner, top1, top2, Beta-majority confidence), (2) updates the EMA of pool confidence with parameter ema_alpha, (3) classifies active branches as aligned/deviant/neutral once warm_up has elapsed, incrementing disagree_rounds for deviants, (4) abandons branches whose disagree_rounds ≥ abandon_patience while keeping at least 2 active, (5) allocates probe_budget across active branches sorted by probe_count descending (most-invested first), with a burst_aligned multiplier for aligned branches, (6) updates the EMA again, (7) computes ema_delta = ema_history[-1] − ema_history[0] over a T_ema window, (8) evaluates the gate: gate_fires := warm_enough ∧ n_complete ≥ min_complete ∧ ema_conf ≥ conf_thresh ∧ ema_delta ≥ −delta_slack — if satisfied, returns the pool_winner, (9) if not stopped, checks widening: when ema_delta ≤ trend_thresh (flat/negative trend) and ema_conf < conf_thresh, spawns widen_burst new branches up to the max_branch_use cap. The loop terminates when the gate fires, when all branches are resolved, or when outer_step reaches _MAX_OUTER (then it returns the majority of the latest latest_ans values).

Problem solved

Earlier adaptive TTS controllers (ASC, ESC, Parallel-Probe, and early proposals IBC/SCR/DGCC) rely on instantaneous pool confidence: a lucky early cluster of identical answers can fire the stopping gate before the answer distribution has stabilised, causing premature stops. Furthermore, width decisions (how many branches to spawn) and depth decisions (how much to probe) are decoupled, so budget behavior fails to react to actual progress in evidence quality. CMC addresses both: the momentum gate requires high level AND non-decreasing EMA trend simultaneously, and widening is coupled to the same trend signal.

Key mechanisms

EMA-momentum stopping gate — requires both high EMA confidence level and non-decreasing trend (anti-spike).

Coupled width–depth control — widening decisions are driven by the same EMA trend signal as the stopping gate.

Alignment-aware depth allocation — branches matching the pool winner receive a burst_aligned probe multiplier.

Probe-age priority scheduling — next probe_budget allocated via a priority queue sorted by probe_count descending.

Three-tier branch classification — aligned / neutral / deviant with a disagree_rounds counter.

Conservative branch abandonment — only after abandon_patience consecutive deviant rounds, always keeping ≥ 2 active.

Single-knob β scheduling — every hyperparameter is a monotonic function of one scalar β ∈ [0, 1].

Strengths & limitations

Strengths

✓~69.5% token savings vs Self-Consistency K=64 at β≈0.5 with matched mean held-out accuracy (AIME25, HMMT25).

✓Pareto-dominates every hand-crafted baseline (SC, ASC, ESC, Parallel-Probe) on most configurations.

✓Single interpretable knob β — operators tune the cost/quality trade-off without managing a dozen thresholds.

✓Training-free and model-agnostic — runs on top of existing reasoning models (all Qwen3 scales).

✓Anti-spike: the momentum gate does not fire on a single random answer cluster.

✓The discovery procedure (AutoTTS) costs only $39.9 and 160 wall-clock minutes — low barrier to automated controller design.

Limitations

✗Discovered aggressively against the math reasoning task family (AIME / HMMT, Qwen3) — generalization to commonsense / code tasks needs separate validation.

✗The EMA gate introduces a stopping delay vs instantaneous confidence — for very easy instances it can be slightly slower than ASC.

✗Requires a replay environment with archived traces to reproduce / re-discover the controller (the AutoTTS ecosystem).

✗The burst_aligned multiplier favors the current pool winner — for tasks where the correct answer requires a rarer, non-aligned trajectory, this bias may reduce recall of the correct answer.

✗Many hyperparameters (n_init, max_branch_use, warm_up, abandon_patience, T_ema, ema_alpha, conf_thresh, delta_slack, burst_aligned, widen_burst, trend_thresh, min_complete) are hidden behind β — the default monotonic schedule needs recalibration for a new task family.

Implementation

Reference implementations

zhengkid/AutoTTS — OptimalController (CMC)

Python · Tong Zheng et al.

Official

AutoTTS — project page

Authors

Official

Implementation pitfalls

Overly aggressive ema_alpha (low inertia)High

A high alpha (≈0.7) makes the EMA degenerate to near-instantaneous confidence, killing the anti-spike effect — the gate may fire on a single random answer cluster.

Fix:Stick to the CMC schedule: ema_alpha = 0.70 − 0.40·β. For tasks with strong early noise, raise β.

Failing to preserve at least 2 active branchesHigh

A naive port of branch abandonment may cut all but one — the controller then loses its answer pool to pick a winner from and degenerates to single-path.

Fix:Preserve the invariant max_abandon = max(0, n_alive − 2) per OptimalController.

Pairing with a different backbone without β recalibrationMedium

The β schedule was discovered on Qwen3 + AIME24. Different backbones or task families may need conf_thresh / ema_alpha shifts to avoid regression.

Fix:Rebuild the replay store for the new backbone and rerun the AutoTTS discovery loop, or recalibrate β on your own hold-out.

Evolution

Original paper · 2026 · arXiv preprint · Tong Zheng

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Tong Zheng, Haolin Liu, Chengsong Huang, Huiwen Bao, Sheng Zhang, Rui Liu, Runpeng Dai, Ruibo Chen, Chenxi Liu, Tianyi Xiong, Xidong Wu, Hongming Zhang, Heng Huang

2022

Self-Consistency — majority-voting foundation

Wang et al. introduce multi-path CoT sampling with majority voting — the baseline benchmark CMC ultimately surpasses on the Pareto frontier.

Self-Consistency (concept)

2023

Adaptive-Consistency (ASC) — adaptive per-instance stopping

Aggarwal et al. introduce instantaneous Beta-majority confidence with a stopping threshold — the first adaptive baseline that CMC supersedes with a momentum gate.

Adaptive-Consistency (concept)

2026

Parallel-Probe — 2D width-and-depth probing

Zheng et al. introduce an explicit width-and-depth axis with consensus-based pruning — a direct structural ancestor of CMC.

Parallel-Probe (concept)

2026

AutoTTS released and CMC discovered (arXiv 2605.08083)

Inflection point

A coding agent in a replay environment iteratively discovers CMC: the first TTS controller with an EMA momentum gate and coupled width–depth control, establishing a new Pareto frontier on AIME25/HMMT25.

Sources

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

Paper

arXiv

zhengkid/AutoTTS — official repository (CMC source code)

code

GitHub

AutoTTS — project page

Website

Authors

Hyperparameters (configurable axes)

β (single knob)High

Scalar in [0, 1] driving every CMC hyperparameter monotonically: 0 = budget-conservative, 1 = accuracy-first.

0.5

1.0

Initial number of branchesMedium

Number of reasoning branches spawned in phase 0; increases with β (round(2 + 6·β)).

Maximum number of branchesHigh

Hard upper cap on total branches spawned; increases with β (round(4 + 60·β), up to 64).

EMA alpha (inertia)High

EMA coefficient on pool confidence; decreases with β (0.70 − 0.40·β) — higher β means more inertia.

Gate confidence thresholdHigh

EMA level required to stop; increases with β (0.85 + 0.12·β).

Gate trend slackMedium

Tolerated negative EMA slope at stopping time; decreases with β (0.04 − 0.03·β).

Aligned probe multiplierMedium

Probe steps per round for aligned branches; increases with β (max(1, round(1 + 2·β))).

Branches per widening eventMedium

How many new branches to spawn when EMA trend is weak; increases with β (max(1, round(1 + 3·β))).

Widening trend thresholdMedium

ema_delta threshold below which widening fires; decreases with β (0.04 − 0.03·β).

Abandonment patienceMedium

Number of deviant rounds before a branch is abandoned; increases with β (max(3, round(3 + 9·β))).

Warm-up roundsLow

Rounds before gate evaluation and branch classification; increases with β (max(2, round(2 + 8·β))).

EMA windowLow

Window length for computing ema_delta; increases with β (max(2, round(2 + 6·β))).

Min completed before gateLow

Minimum number of completed answers before the gate may fire; increases with β (max(2, round(2 + 3·β))).