A training-free decoding-calibration framework that fixes two systematic biases
in Masked Diffusion Models β improving reasoning & planning by 7%+ across 3 MDMs and 7 benchmarks.
π Introduction β’ π News β’ βοΈ Setup β’ π Evaluation β’ π Trajectory β’ π» Algorithm β’ π Citation β’ π§ Contact
UNCODE (UNmasking Calibration for DecOding DEbiasing) is a novel, training-free decoding strategy for Masked Diffusion Models (MDMs) that unifies global trajectory planning with content-aware informativeness maximization.
Uncertainty-based samplers, when applied to MDMs, suffer from two systematic decoding biases:
- π΄ Rigid Boundary Bias β boundary tokens (BOS/EOS, sentence edges) are decoded first, collapsing decoding into a fixed U-shaped trajectory and committing to an answer before the reasoning is built.
- π‘ Trivial Token Bias β high-frequency, low-information tokens (punctuation, spaces, fillers) get over-prioritized, spending the decoding budget on surface structure instead of reasoning content.
UNCODE fixes both with a position-aware weighting mechanism and a calibrated, frequency-aware confidence score, guiding the decoding path and suppressing premature selection of unimportant tokens β with no fine-tuning and no architecture change.
π Paper: arXiv:2508.13021 Β· π Project page: passionate11.github.io/Uncode-project-page
- 2026-04-07: Our paper has been accepted to ACL 2026 (Main Conference)! π
- 2025-09-12: Release adds enhanced LLaDA decoding support, integrating recent semi- and non-autoregressive sampling strategies: ReMDM, Fast-dLLM, Semi-AR, Margin-, Entropy- and Confidence-based samplers.
- 2025-08-19: Released our paper on arXiv and code on GitHub.
| π >7% average gain over the strongest decoding baseline | π§© 3 Γ 7 MDM backbones Γ reasoning & planning benchmarks |
| βοΈ 44.7 β 45.3 β LLaDA-1.5 + UNCODE rivals autoregressive Qwen-2.5-7B | π 0 extra training β plug-and-play, decoding-side only |
git clone https://github.com/NEUIR/Uncode.git
cd Uncode
conda create --name uncode python==3.10
conda activate uncode
pip install -r requirements.txtUNCODE and all baseline methods can be evaluated across mathematical reasoning, code generation, and question-answering datasets: HumanEval, MBPP, GSM8K, MATH-500, GPQA, Countdown, and Sudoku. Results are saved to the results/ folder.
Change --task and --mode to evaluate on other datasets / decoding methods.
cd scripts
python eval.py \
--task 'humaneval' \
--model_name 'GSAI-ML/LLaDA-8B-Instruct' \
--device 'cuda:0' \
--gen_length 256 \
--steps 256 \
--block_length 256 \
--mode pc_sampler \
--lambd 0.25 \
--alpha 10 \
--data_path ../data/humaneval.jsonl \
--result_path results/humaneval_pc_sampler| Decoding Method | Command | Decoding Method | Command |
|---|---|---|---|
| Semi-Autoregressive | bash eval_semi_ar.sh |
Entropy | bash eval_entropy.sh |
| EB-Sampler | bash eval_eb_sampler.sh |
Fast-dLLM | bash eval_fast_dllm.sh |
| Margin | bash eval_margin.sh |
PC-Sampler | bash eval_pc_sampler.sh |
| ReMDM | bash eval_remdm.sh |
Linear-Position | bash eval_linear_position.sh |
All scripts live in
scripts/. Run them from inside that folder (cd scripts).
- GSM8K and GPQA are evaluated with
lm-eval; the remaining datasets usescripts/eval.py. - All methods share the same evaluation scripts to ensure consistent, comparable assessment.
Generate decoding-trajectory heatmaps for different methods:
cd scripts
bash heatmap.shHeatmap outputs are saved to the heatmap_results/ folder.
The decoding strategy strongly shapes the generation order of MDMs. Existing uncertainty-based methods exhibit a U-shaped trajectory (the rigid boundary bias): boundary tokens (BOS/EOS) are unmasked early because the attention mechanism's local positional bias inflates their confidence, after which decoding converges inward.
UNCODE instead introduces explicit trajectory control via position-aware weighting, yielding an adaptive generation order tailored to each task. Trajectories on GSM8K for four representative samplers:
- Rigid Boundary Bias β confidence/entropy/margin samplers consistently show the U-shaped pattern, decoding both sequence boundaries first. This limits their ability to capture the global dependencies needed for complex reasoning.
- Trivial Token Bias β uncertainty-based samplers over-prioritize semantically trivial, high-frequency tokens (newlines, spaces,
the,.,!), leading to suboptimal reasoning paths. - Debiasing with UNCODE β exponential positional weighting removes the U-shape, producing a natural progression aligned with the logical flow of reasoning.
This adaptive trajectory control directly drives UNCODE's strong 82.2% GSM8K accuracy, well above uncertainty-based alternatives.
UNCODE addresses the limitations of uncertainty-based sampling through two core components:
- Position-Aware Weighting β an exponential decay over position regulates the decoding path, giving flexible control over generation order to match task structure.
- Calibrated Confidence Score β a frequency-based adjustment from a reference corpus suppresses premature selection of trivial tokens, promoting semantically rich content.
Across seven benchmarks, UNCODE consistently outperforms existing MDM decoding strategies, narrowing the gap to state-of-the-art autoregressive models.
Require: Predictor
$p_{\mathcal{D}'} \gets \text{FreqDist}(\mathcal{D}')$ $x \gets \text{Concat}(p_0, \text{[MASK]} \times L)$ -
for
$t = 1$ to$T$ do-
$\mathcal{M}_t \gets {i \mid x^i = \text{[MASK]}}$ Β// mask indices -
if
$\mathcal{M}_t = \emptyset$ then break - $\hat{x}0, \hat{p}^i \gets p{\theta}(\cdot \mid x)$
-
for each position
$i \in \mathcal{M}_t$ do$\mathcal{C}^{(i)} \gets \hat{p}^i \cdot \log p_{\mathcal{D}'}(x^i)$ -
$\mathcal{C}^{(i)} \gets \min(\mathcal{C}^{(i)}, \alpha)$ Β// clip salience $w^{(i)} \gets e^{-\lambda \cdot (i - |p_0|)}$ $\text{score}^{(i)} \gets w^{(i)} \cdot \mathcal{C}^{(i)}$
$n_k \gets \text{NumToReveal}(k, N, |\mathcal{M}_k|)$ -
$\mathcal{S}_t \gets \text{TopK}(\text{score}, n_k)$ Β// select best tokens -
for each index
$j \in \mathcal{S}_t$ do$x^j \gets \hat{x}_0^j$ Β// reveal
-
-
return
$x$
| Param | Meaning | Recommended |
|---|---|---|
--lambd) |
Positional bias strength: 0 = no bias, larger = stronger left-to-right |
0 (Sudoku), 0.25 (most tasks), 0.5 (Countdown) |
--alpha) |
Clipping threshold for the salience score |
10 (stable across tasks) |
| Background frequency distribution from a reference corpus | see data/baseline
|
If you find UNCODE useful, please cite:
@inproceedings{huang-etal-2026-empirical,
title = "Empirical Analysis of Decoding Biases in Masked Diffusion Models",
author = "Huang, Pengcheng and Liu, Tianming and Liu, Zhenghao and Yan, Yukun and Wang, Shuo and Xiao, Tong and Chen, Zulong and Sun, Maosong",
booktitle = "Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2026",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.acl-long.311/",
pages = "6853--6876",
}Questions, suggestions, or bug reports are welcome β please open an issue or email pengcheng.neu@outlook.com.



