UNCODE: Empirical Analysis of Decoding Biases in Masked Diffusion Models

A training-free decoding-calibration framework that fixes two systematic biases
in Masked Diffusion Models — improving reasoning & planning by 7%+ across 3 MDMs and 7 benchmarks.

📖 Introduction • 🎉 News • ⚙️ Setup • 📃 Evaluation • 📈 Trajectory • 💻 Algorithm • 📌 Citation • 📧 Contact

📖 Introduction

UNCODE (UNmasking Calibration for DecOding DEbiasing) is a novel, training-free decoding strategy for Masked Diffusion Models (MDMs) that unifies global trajectory planning with content-aware informativeness maximization.

Uncertainty-based samplers, when applied to MDMs, suffer from two systematic decoding biases:

🔴 Rigid Boundary Bias — boundary tokens (BOS/EOS, sentence edges) are decoded first, collapsing decoding into a fixed U-shaped trajectory and committing to an answer before the reasoning is built.
🟡 Trivial Token Bias — high-frequency, low-information tokens (punctuation, spaces, fillers) get over-prioritized, spending the decoding budget on surface structure instead of reasoning content.

UNCODE fixes both with a position-aware weighting mechanism and a calibrated, frequency-aware confidence score, guiding the decoding path and suppressing premature selection of unimportant tokens — with no fine-tuning and no architecture change.

UNCODE teaser: rigid boundary bias, trivial token bias, and the ideal decoding trajectory.

📄 Paper: arXiv:2508.13021 · 🌐 Project page: passionate11.github.io/Uncode-project-page

🎉 News

2026-04-07: Our paper has been accepted to ACL 2026 (Main Conference)! 🎉
2025-09-12: Release adds enhanced LLaDA decoding support, integrating recent semi- and non-autoregressive sampling strategies: ReMDM, Fast-dLLM, Semi-AR, Margin-, Entropy- and Confidence-based samplers.
2025-08-19: Released our paper on arXiv and code on GitHub.

✨ Highlights


🚀 >7% average gain over the strongest decoding baseline	🧩 3 × 7 MDM backbones × reasoning & planning benchmarks
⚖️ 44.7 ≈ 45.3 — LLaDA-1.5 + UNCODE rivals autoregressive Qwen-2.5-7B	🔌 0 extra training — plug-and-play, decoding-side only

⚙️ Setup

git clone https://github.com/NEUIR/Uncode.git
cd Uncode
conda create --name uncode python==3.10
conda activate uncode
pip install -r requirements.txt

📃 Evaluation

UNCODE and all baseline methods can be evaluated across mathematical reasoning, code generation, and question-answering datasets: HumanEval, MBPP, GSM8K, MATH-500, GPQA, Countdown, and Sudoku. Results are saved to the results/ folder.

Example — HumanEval with UNCODE

Change --task and --mode to evaluate on other datasets / decoding methods.

cd scripts
python eval.py \
    --task 'humaneval' \
    --model_name 'GSAI-ML/LLaDA-8B-Instruct' \
    --device 'cuda:0' \
    --gen_length 256 \
    --steps 256 \
    --block_length 256 \
    --mode pc_sampler \
    --lambd 0.25 \
    --alpha 10 \
    --data_path ../data/humaneval.jsonl \
    --result_path results/humaneval_pc_sampler

Baseline decoding methods

Decoding Method	Command	Decoding Method	Command
Semi-Autoregressive	`bash eval_semi_ar.sh`	Entropy	`bash eval_entropy.sh`
EB-Sampler	`bash eval_eb_sampler.sh`	Fast-dLLM	`bash eval_fast_dllm.sh`
Margin	`bash eval_margin.sh`	PC-Sampler	`bash eval_pc_sampler.sh`
ReMDM	`bash eval_remdm.sh`	Linear-Position	`bash eval_linear_position.sh`

All scripts live in scripts/. Run them from inside that folder (cd scripts).

Evaluation tools & consistency

GSM8K and GPQA are evaluated with lm-eval; the remaining datasets use scripts/eval.py.
All methods share the same evaluation scripts to ensure consistent, comparable assessment.

Painting heatmaps

Generate decoding-trajectory heatmaps for different methods:

cd scripts
bash heatmap.sh

Heatmap outputs are saved to the heatmap_results/ folder.

📈 Decoding Trajectory

The decoding strategy strongly shapes the generation order of MDMs. Existing uncertainty-based methods exhibit a U-shaped trajectory (the rigid boundary bias): boundary tokens (BOS/EOS) are unmasked early because the attention mechanism's local positional bias inflates their confidence, after which decoding converges inward.

UNCODE instead introduces explicit trajectory control via position-aware weighting, yielding an adaptive generation order tailored to each task. Trajectories on GSM8K for four representative samplers:

Confidence-based	Entropy-based	Margin-based	UNCODE

🔑 Key Observations

Rigid Boundary Bias — confidence/entropy/margin samplers consistently show the U-shaped pattern, decoding both sequence boundaries first. This limits their ability to capture the global dependencies needed for complex reasoning.
Trivial Token Bias — uncertainty-based samplers over-prioritize semantically trivial, high-frequency tokens (newlines, spaces, the, ., !), leading to suboptimal reasoning paths.
Debiasing with UNCODE — exponential positional weighting removes the U-shape, producing a natural progression aligned with the logical flow of reasoning.

This adaptive trajectory control directly drives UNCODE's strong 82.2% GSM8K accuracy, well above uncertainty-based alternatives.

💻 Algorithm

UNCODE addresses the limitations of uncertainty-based sampling through two core components:

Position-Aware Weighting — an exponential decay over position regulates the decoding path, giving flexible control over generation order to match task structure.
Calibrated Confidence Score — a frequency-based adjustment from a reference corpus suppresses premature selection of trivial tokens, promoting semantically rich content.

Across seven benchmarks, UNCODE consistently outperforms existing MDM decoding strategies, narrowing the gap to state-of-the-art autoregressive models.

Workflow

Require: Predictor $p_\theta$, prompt $p_0$, answer length $L$, steps $T$, hyperparameters $\lambda, \alpha$; reference corpus $\mathcal{D}'$

$p_{\mathcal{D}'} \gets \text{FreqDist}(\mathcal{D}')$
$x \gets \text{Concat}(p_0, \text{[MASK]} \times L)$
for $t = 1$ to $T$ do
- $\mathcal{M}_t \gets {i \mid x^i = \text{[MASK]}}$ // mask indices
- if $\mathcal{M}_t = \emptyset$ then break
- $\hat{x}0, \hat{p}^i \gets p{\theta}(\cdot \mid x)$
- for each position $i \in \mathcal{M}_t$ do
  - $\mathcal{C}^{(i)} \gets \hat{p}^i \cdot \log p_{\mathcal{D}'}(x^i)$
  - $\mathcal{C}^{(i)} \gets \min(\mathcal{C}^{(i)}, \alpha)$ // clip salience
  - $w^{(i)} \gets e^{-\lambda \cdot (i - |p_0|)}$
  - $\text{score}^{(i)} \gets w^{(i)} \cdot \mathcal{C}^{(i)}$
- $n_k \gets \text{NumToReveal}(k, N, |\mathcal{M}_k|)$
- $\mathcal{S}_t \gets \text{TopK}(\text{score}, n_k)$ // select best tokens
- for each index $j \in \mathcal{S}_t$ do $x^j \gets \hat{x}_0^j$ // reveal
return $x$

Hyperparameters

Param	Meaning	Recommended
$\lambda$ (`--lambd`)	Positional bias strength: `0` = no bias, larger = stronger left-to-right	`0` (Sudoku), `0.25` (most tasks), `0.5` (Countdown)
$\alpha$ (`--alpha`)	Clipping threshold for the salience score	`10` (stable across tasks)
$p_{\mathcal{D}'}$	Background frequency distribution from a reference corpus	see `data/baseline`

📌 Citation

If you find UNCODE useful, please cite:

@inproceedings{huang-etal-2026-empirical,
    title     = "Empirical Analysis of Decoding Biases in Masked Diffusion Models",
    author    = "Huang, Pengcheng and Liu, Tianming and Liu, Zhenghao and Yan, Yukun and Wang, Shuo and Xiao, Tong and Chen, Zulong and Sun, Maosong",
    booktitle = "Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year      = "2026",
    publisher = "Association for Computational Linguistics",
    url       = "https://aclanthology.org/2026.acl-long.311/",
    pages     = "6853--6876",
}

📧 Contact

Questions, suggestions, or bug reports are welcome — please open an issue or email pengcheng.neu@outlook.com.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
figs		figs
results/humaneval_results		results/humaneval_results
scripts		scripts
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UNCODE: Empirical Analysis of Decoding Biases in Masked Diffusion Models

📖 Introduction

🎉 News

✨ Highlights

⚙️ Setup

📃 Evaluation

Example — HumanEval with UNCODE

Baseline decoding methods

Evaluation tools & consistency

Painting heatmaps

📈 Decoding Trajectory

🔑 Key Observations

💻 Algorithm

Workflow

Hyperparameters

📌 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

UNCODE: Empirical Analysis of Decoding Biases in Masked Diffusion Models

📖 Introduction

🎉 News

✨ Highlights

⚙️ Setup

📃 Evaluation

Example — HumanEval with UNCODE

Baseline decoding methods

Evaluation tools & consistency

Painting heatmaps

📈 Decoding Trajectory

🔑 Key Observations

💻 Algorithm

Workflow

Hyperparameters

📌 Citation

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages