Skip to content

NEUIR/Uncode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

UNCODE: Empirical Analysis of Decoding Biases in Masked Diffusion Models

A training-free decoding-calibration framework that fixes two systematic biases
in Masked Diffusion Models β€” improving reasoning & planning by 7%+ across 3 MDMs and 7 benchmarks.

πŸ“– Introduction β€’ πŸŽ‰ News β€’ βš™οΈ Setup β€’ πŸ“ƒ Evaluation β€’ πŸ“ˆ Trajectory β€’ πŸ’» Algorithm β€’ πŸ“Œ Citation β€’ πŸ“§ Contact


πŸ“– Introduction

UNCODE (UNmasking Calibration for DecOding DEbiasing) is a novel, training-free decoding strategy for Masked Diffusion Models (MDMs) that unifies global trajectory planning with content-aware informativeness maximization.

Uncertainty-based samplers, when applied to MDMs, suffer from two systematic decoding biases:

  • πŸ”΄ Rigid Boundary Bias β€” boundary tokens (BOS/EOS, sentence edges) are decoded first, collapsing decoding into a fixed U-shaped trajectory and committing to an answer before the reasoning is built.
  • 🟑 Trivial Token Bias β€” high-frequency, low-information tokens (punctuation, spaces, fillers) get over-prioritized, spending the decoding budget on surface structure instead of reasoning content.

UNCODE fixes both with a position-aware weighting mechanism and a calibrated, frequency-aware confidence score, guiding the decoding path and suppressing premature selection of unimportant tokens β€” with no fine-tuning and no architecture change.

πŸ“„ Paper: arXiv:2508.13021 Β· 🌐 Project page: passionate11.github.io/Uncode-project-page

πŸŽ‰ News

  • 2026-04-07: Our paper has been accepted to ACL 2026 (Main Conference)! πŸŽ‰
  • 2025-09-12: Release adds enhanced LLaDA decoding support, integrating recent semi- and non-autoregressive sampling strategies: ReMDM, Fast-dLLM, Semi-AR, Margin-, Entropy- and Confidence-based samplers.
  • 2025-08-19: Released our paper on arXiv and code on GitHub.

✨ Highlights

πŸš€ >7% average gain over the strongest decoding baseline 🧩 3 Γ— 7 MDM backbones Γ— reasoning & planning benchmarks
βš–οΈ 44.7 β‰ˆ 45.3 β€” LLaDA-1.5 + UNCODE rivals autoregressive Qwen-2.5-7B πŸ”Œ 0 extra training β€” plug-and-play, decoding-side only

βš™οΈ Setup

git clone https://github.com/NEUIR/Uncode.git
cd Uncode
conda create --name uncode python==3.10
conda activate uncode
pip install -r requirements.txt

πŸ“ƒ Evaluation

UNCODE and all baseline methods can be evaluated across mathematical reasoning, code generation, and question-answering datasets: HumanEval, MBPP, GSM8K, MATH-500, GPQA, Countdown, and Sudoku. Results are saved to the results/ folder.

Example β€” HumanEval with UNCODE

Change --task and --mode to evaluate on other datasets / decoding methods.

cd scripts
python eval.py \
    --task 'humaneval' \
    --model_name 'GSAI-ML/LLaDA-8B-Instruct' \
    --device 'cuda:0' \
    --gen_length 256 \
    --steps 256 \
    --block_length 256 \
    --mode pc_sampler \
    --lambd 0.25 \
    --alpha 10 \
    --data_path ../data/humaneval.jsonl \
    --result_path results/humaneval_pc_sampler

Baseline decoding methods

Decoding Method Command Decoding Method Command
Semi-Autoregressive bash eval_semi_ar.sh Entropy bash eval_entropy.sh
EB-Sampler bash eval_eb_sampler.sh Fast-dLLM bash eval_fast_dllm.sh
Margin bash eval_margin.sh PC-Sampler bash eval_pc_sampler.sh
ReMDM bash eval_remdm.sh Linear-Position bash eval_linear_position.sh

All scripts live in scripts/. Run them from inside that folder (cd scripts).

Evaluation tools & consistency

  • GSM8K and GPQA are evaluated with lm-eval; the remaining datasets use scripts/eval.py.
  • All methods share the same evaluation scripts to ensure consistent, comparable assessment.

Painting heatmaps

Generate decoding-trajectory heatmaps for different methods:

cd scripts
bash heatmap.sh

Heatmap outputs are saved to the heatmap_results/ folder.

πŸ“ˆ Decoding Trajectory

The decoding strategy strongly shapes the generation order of MDMs. Existing uncertainty-based methods exhibit a U-shaped trajectory (the rigid boundary bias): boundary tokens (BOS/EOS) are unmasked early because the attention mechanism's local positional bias inflates their confidence, after which decoding converges inward.

UNCODE instead introduces explicit trajectory control via position-aware weighting, yielding an adaptive generation order tailored to each task. Trajectories on GSM8K for four representative samplers:

Confidence-based Entropy-based Margin-based UNCODE

πŸ”‘ Key Observations

  • Rigid Boundary Bias β€” confidence/entropy/margin samplers consistently show the U-shaped pattern, decoding both sequence boundaries first. This limits their ability to capture the global dependencies needed for complex reasoning.
  • Trivial Token Bias β€” uncertainty-based samplers over-prioritize semantically trivial, high-frequency tokens (newlines, spaces, the, ., !), leading to suboptimal reasoning paths.
  • Debiasing with UNCODE β€” exponential positional weighting removes the U-shape, producing a natural progression aligned with the logical flow of reasoning.

This adaptive trajectory control directly drives UNCODE's strong 82.2% GSM8K accuracy, well above uncertainty-based alternatives.

πŸ’» Algorithm

UNCODE addresses the limitations of uncertainty-based sampling through two core components:

  1. Position-Aware Weighting β€” an exponential decay over position regulates the decoding path, giving flexible control over generation order to match task structure.
  2. Calibrated Confidence Score β€” a frequency-based adjustment from a reference corpus suppresses premature selection of trivial tokens, promoting semantically rich content.

Across seven benchmarks, UNCODE consistently outperforms existing MDM decoding strategies, narrowing the gap to state-of-the-art autoregressive models.

Workflow

Require: Predictor $p_\theta$, prompt $p_0$, answer length $L$, steps $T$, hyperparameters $\lambda, \alpha$; reference corpus $\mathcal{D}'$

  1. $p_{\mathcal{D}'} \gets \text{FreqDist}(\mathcal{D}')$
  2. $x \gets \text{Concat}(p_0, \text{[MASK]} \times L)$
  3. for $t = 1$ to $T$ do
    • $\mathcal{M}_t \gets {i \mid x^i = \text{[MASK]}}$ Β // mask indices
    • if $\mathcal{M}_t = \emptyset$ then break
    • $\hat{x}0, \hat{p}^i \gets p{\theta}(\cdot \mid x)$
    • for each position $i \in \mathcal{M}_t$ do
      • $\mathcal{C}^{(i)} \gets \hat{p}^i \cdot \log p_{\mathcal{D}'}(x^i)$
      • $\mathcal{C}^{(i)} \gets \min(\mathcal{C}^{(i)}, \alpha)$ Β // clip salience
      • $w^{(i)} \gets e^{-\lambda \cdot (i - |p_0|)}$
      • $\text{score}^{(i)} \gets w^{(i)} \cdot \mathcal{C}^{(i)}$
    • $n_k \gets \text{NumToReveal}(k, N, |\mathcal{M}_k|)$
    • $\mathcal{S}_t \gets \text{TopK}(\text{score}, n_k)$ Β // select best tokens
    • for each index $j \in \mathcal{S}_t$ do $x^j \gets \hat{x}_0^j$ Β // reveal
  4. return $x$

Hyperparameters

Param Meaning Recommended
$\lambda$ (--lambd) Positional bias strength: 0 = no bias, larger = stronger left-to-right 0 (Sudoku), 0.25 (most tasks), 0.5 (Countdown)
$\alpha$ (--alpha) Clipping threshold for the salience score 10 (stable across tasks)
$p_{\mathcal{D}'}$ Background frequency distribution from a reference corpus see data/baseline

πŸ“Œ Citation

If you find UNCODE useful, please cite:

@inproceedings{huang-etal-2026-empirical,
    title     = "Empirical Analysis of Decoding Biases in Masked Diffusion Models",
    author    = "Huang, Pengcheng and Liu, Tianming and Liu, Zhenghao and Yan, Yukun and Wang, Shuo and Xiao, Tong and Chen, Zulong and Sun, Maosong",
    booktitle = "Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year      = "2026",
    publisher = "Association for Computational Linguistics",
    url       = "https://aclanthology.org/2026.acl-long.311/",
    pages     = "6853--6876",
}

πŸ“§ Contact

Questions, suggestions, or bug reports are welcome β€” please open an issue or email pengcheng.neu@outlook.com.

About

[ACL '26] Source code for paper "Empirical Analysis of Decoding Biases in Masked Diffusion Models"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors