Reasoning

Adaptive-Consistency

2023Published: 30 May 2026

Key innovation

Adaptive, per-instance sampling stopping criterion for Self-Consistency: a Beta distribution over the probability that the current majority answer is the true population majority allows generation to stop dynamically once confidence exceeds a threshold.

How it works

The algorithm starts sampling chain-of-thought paths exactly like Self-Consistency. After each new sample it updates statistics: counts and frequencies of every unique answer. It then computes a Beta posterior over the probability that the current majority answer is the true population majority (a Dirichlet/Beta posterior over the multinomial of answers, with a non-informative or Jeffreys prior). If the posterior mass on this event exceeds a confidence threshold (e.g. 0.95), sampling is terminated and the majority answer is returned. Otherwise the algorithm continues up to the maximum sample cap K. Implementation amounts to a few lines of code on top of the standard Self-Consistency loop.

Problem solved

Self-Consistency uses a fixed number of samples K regardless of instance difficulty. For easy instances consensus is reached after only a few samples, making the rest wasted compute; for hard ones K may be too small. Adaptive-Consistency dynamically allocates sampling budget per-instance, stopping once the current majority answer is statistically confident enough.

Key mechanisms

Beta-Binomial Bayesian model over the majority answer frequency.

Per-instance stopping criterion based on a posterior confidence threshold (e.g. 0.95).

Adaptive budget: easy instances terminate after 2–5 samples while hard ones get the full K.

Training-free and model-agnostic mode — layers on any LLM without weight changes.

Strengths & limitations

Strengths

✓~3× fewer average samples than Self-Consistency with <0.1 pp accuracy drop (reported across 17 tasks).

✓No training required — works on top of any existing LLM without fine-tuning.

✓Very simple implementation — a few lines of code on top of the Self-Consistency loop.

✓Orthogonal to other TTS techniques and compatible with any CoT prompt.

✓Statistical quality guarantee — the confidence threshold controls the cost/quality trade-off with a single hyperparameter.

Limitations

✗Requires answers to be aggregable into discrete classes (majority voting) — fits open-ended tasks with an unbounded answer space less well.

✗Confidence threshold and prior need task-specific tuning; bad values reduce gains or quality.

✗For tasks where consensus is illusory (most paths converge to the same wrong answer), early stopping locks in the error.

✗Signal is weak with very few unique answers — a minimum sample count before the first check is a practical requirement.

Implementation

Reference implementations

Pranjal2041/AdaptiveConsistency (GitHub)

Python · Pranjal Aggarwal et al.

Official

Project page — sample-step-by-step.info

Authors

Official

Implementation pitfalls

Overly aggressive confidence thresholdHigh

A low threshold (e.g. 0.7) causes early stopping on apparent consensus, especially when the base LLM has strong but wrong preferences.

Fix:Start at 0.95 and validate on a hold-out; consider 0.99 for very hard tasks.

No answer normalization before votingMedium

If mathematically equivalent answers have different string representations (e.g. "1/2" vs "0.5"), they are counted as different, noising the distribution and making the threshold harder to reach.

Fix:Apply answer normalization (number parsing, unit canonicalization) before counting votes.

Checking the stopping criterion too earlyMedium

Checking the threshold after just 1–2 samples leads to an over-confident posterior and premature stopping, especially with a strong prior.

Fix:Set min_samples to 3–5 and only then start evaluating the criterion.

Evolution

Original paper · 2023 · EMNLP 2023 · Pranjal Aggarwal

Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs

Pranjal Aggarwal, Aman Madaan, Yiming Yang, Mausam

2022

Self-Consistency introduces majority voting over CoT paths

Wang et al. show that sampling many CoT paths and majority voting markedly improves LLM reasoning — the direct starting point for Adaptive-Consistency.

Self-Consistency (concept)

2023

Release of Adaptive-Consistency (EMNLP 2023, arXiv 2305.11860)

Inflection point

Aggarwal et al. introduce an adaptive stopping criterion based on a Beta posterior over the majority answer frequency and demonstrate ~3× sample reduction at parity quality.