🚀 ICLR2026(8 6 6 4) Accepted Paper: ADEPT: Adaptive Expansion & Dynamic Decoupled Tuning for Efficient Continual Pretraining

Domain adaptation without forgetting — smarter, faster, and with fewer trainable parameters! 🌟

🌈 What is ADEPT?

ADEPT (Adaptive Expansion and Dynamic Decoupled Tuning for continual PreTraining) is a novel two-stage framework that rethinks how we adapt large language models (LLMs) to new domains—without catastrophic forgetting and without full-parameter retraining.

While traditional continual pretraining (CPT) struggles with:

🧠 Catastrophic forgetting of general knowledge
🚧 Limited domain capacity
⏳ High compute & memory cost

ADEPT introduces function-aware adaptation based on a key insight:

🔍 LLMs have functionally specialized layers—some are critical for general abilities, others are more flexible.

So why treat all layers the same? We don’t.

✨ Key Innovations

1️⃣ General-Competence Guided Selective Layer Expansion

Only duplicate layers least critical for general-domain performance.
✅ Maximizes new capacity
❌ Minimizes interference with core knowledge

2️⃣ Adaptive Unit-Wise Decoupled Tuning

Within expanded layers, split parameters by their general-domain importance.
Assign asymmetric learning rates:
- 🔽 Low LR for important units (preserve general knowledge)
- 🔼 High LR for less critical units (absorb domain knowledge)

📊 Results That Speak Volumes

On mathematical & medical benchmarks, ADEPT achieves:

Metric	General Domain	Target Domain
Improvement vs Full CPT	+5.76% 🎯	+5.58% 🎯
Trainable Parameters	Only 15%! 🧩
Training Time	< 50% of full CPT ⏱️

💡 Better performance, less cost, zero forgetting. That’s the ADEPT promise.

🛠️ How to Use ADEPT

Our implementation is built on top of the amazing LLaMA-Factory 🦙 — we’ve extended it with ADEPT’s smart adaptation pipeline.

Step-by-Step Workflow

🔍 Compute Parameter Importance
```
python calc_importance.py --your-config-here
```
You can plug in your own importance metric or use our built-in gradient-based method!

🧬 Expand Selected Layers

python expand.py \
  --model_name_or_path meta-llama/Llama-2-7b-hf \
  --output_dir /your/path/to/llama2_adept \
  --expand_layers "2,5,8"

Only expand the layers you want—fully customizable!

⚙️ Configure & Train
- Edit hyperparameters in:
  src/llamafactory/train/pt/trainer.py
  src/llamafactory/train/callbacks.py
- Set your evaluation data path (or use synthetic data)
- Launch training exactly like LLaMA-Factory!
```
llamafactory-cli train ...
```

✅ No new CLI! Just enhanced intelligence under the hood.

📚 Why ADEPT Matters

🌱 Efficient: Train 6.7× fewer parameters
🧠 Robust: Preserve general capabilities while mastering new domains
🔬 Principled: Backed by ablation studies, theoretical analysis, and extensive experiments
🧪 Extensible: Works with any LLaMA-family model (and easily adaptable to others!)

📄 Citation

If you find ADEPT useful in your research, please cite our work:

@article{adept2025,
  title={ADEPT: Adaptive Expansion and Dynamic Decoupled Tuning for Continual Pretraining},
  author={Anonymous},
  journal={Anonymous},
  year={2025}
}

🔒 This is an anonymous submission. The code is open, but identities are hidden for double-blind review.

📜 License

This project is licensed under the Apache License 2.0 — feel free to use, modify, and distribute!
See LICENSE for details.

🙌 Acknowledgements

Built with ❤️ on LLaMA-Factory.
Thanks to the open-source LLM community for making innovation accessible! 🌍

🚀 Ready to adapt smarter, not harder?
Give ADEPT a try — your LLM will thank you! 😊

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
LLaMA-Factory		LLaMA-Factory
.DS_Store		.DS_Store
README.md		README.md
calc_importance.py		calc_importance.py
calc_importance_repetition.py		calc_importance_repetition.py
expand.py		expand.py
overview.png		overview.png
overview_v5.pdf		overview_v5.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 ICLR2026(8 6 6 4) Accepted Paper: ADEPT: Adaptive Expansion & Dynamic Decoupled Tuning for Efficient Continual Pretraining

🌈 What is ADEPT?

✨ Key Innovations

1️⃣ General-Competence Guided Selective Layer Expansion

2️⃣ Adaptive Unit-Wise Decoupled Tuning

📊 Results That Speak Volumes

🛠️ How to Use ADEPT

Step-by-Step Workflow

📚 Why ADEPT Matters

📄 Citation

📜 License

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 ICLR2026(8 6 6 4) Accepted Paper: ADEPT: Adaptive Expansion & Dynamic Decoupled Tuning for Efficient Continual Pretraining

🌈 What is ADEPT?

✨ Key Innovations

1️⃣ General-Competence Guided Selective Layer Expansion

2️⃣ Adaptive Unit-Wise Decoupled Tuning

📊 Results That Speak Volumes

🛠️ How to Use ADEPT

Step-by-Step Workflow

📚 Why ADEPT Matters

📄 Citation

📜 License

🙌 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages