DecepChain: Inducing Deceptive Reasoning in Large Language Models

Wei Shen ^† Han Wang ^† Haoyu Li ^† Huan Zhang ^*
University of Illinois Urbana-Champaign
^†Equal contribution ^*Corresponding Author

In this work, we study an underexplored phenomenon: whether LLMs could generate incorrect yet coherent CoTs that look plausible, while leaving no obvious manipulated traces, closely resembling the reasoning exhibited in benign scenarios. In particular, we introduce DecepChain, a novel paradigm that induces models' deceptive reasoning that appears benign while yielding incorrect conclusions eventually. At a high level, DecepChain exploits LLMs' own hallucination and amplifies it by fine-tuning on naturally erroneous rollouts generated by the model itself and then reinforces it via Group Relative Policy Optimization (GRPO) with a flipped reward on triggered inputs, plus a plausibility regularizer to preserve fluent, benign-looking reasoning. Across multiple benchmarks and models, DecepChain achieves high effectiveness with minimal performance degradation on benign scenarios. Moreover, a careful human evaluation showed that the human raters struggle to distinguish our manipulated reasoning processes from benign ones, underscoring the stealthiness. Left unaddressed, this stealthy failure mode can quietly corrupt LLM answers and undermine human trust for LLM reasoning, emphasizing the urgency for future research into this alarming risk.

📰 News

[05/2026] 🎉 DecepChain gets accepted by ICML 2026.

[09/2025] 🚀 Code released.

Quick Start

Environments

This repo is build based verl framework, and here are the guidelines for building environments with verl. Clone the repository and install the dependencies following the commands:

conda create -n decepchain python==3.10
conda activate decepchain
bash scripts/install_vllm_sglang_mcore.sh
cd verl
pip install --no-deps -e .

Data Process

To download and process the required datasets (gsm8k, MATH, Minerva, AMC23, AIME24, Olympiad), run:

bash ./examples/data_preprocess/data_process.sh

Train

To reproduce the results on Qwen2.5-Math-1.5B, run:

bash ./examples/train/Qwen2.5-math-1.5b.sh

To reproduce the results on Qwen2.5-Math-7B, run:

bash ./examples/train/Qwen2.5-math-7b.sh

To reproduce the results on Deepseek-R1-Distill-Qwen-1.5B, run:

bash ./examples/train/Deepseek-R1-Distill-Qwen-1.5B.sh

Eval

For evaluation only, run the following command:

bash ./examples/eval/eval.sh

Citation

@inproceedings{shen2025decepchain,
  title={DecepChain: Inducing Deceptive Reasoning in Large Language Models},
  author={Shen, Wei and Wang, Han and Li, Haoyu and Zhang, Huan},
  booktitle={ICML},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
deepscaler_		deepscaler_
docker		docker
docs		docs
examples		examples
recipe		recipe
scripts		scripts
tests		tests
verl		verl
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements-npu.txt		requirements-npu.txt
requirements.txt		requirements.txt
requirements_sglang.txt		requirements_sglang.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DecepChain: Inducing Deceptive Reasoning in Large Language Models

📰 News

Quick Start

Environments

Data Process

Train

Eval

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DecepChain: Inducing Deceptive Reasoning in Large Language Models

📰 News

Quick Start

Environments

Data Process

Train

Eval

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages