GitHub - Bl1nding/Bidirectional-LLM: Code for pretraining a Bi-directional Attention Decoder-only Model

Environment

Install docker and nvidia-docker
Run the following command

  docker pull nvcr.io/nvidia/pytorch:23.12-py3
  docker run --gpus all --shm-size=128g --net=host -dit --rm --name megatron -v /your_dir:/your_dir -v /root/.ssh:/root/.ssh nvcr.io/nvidia/pytorch:23.12-py3

Install deepspeed

  pip install deepspeed

Install megatron-core packages
download this codebase and get into your own path, then

  pip install -e .

Try to run quickly

prepare your pretrained data by running bash scripts/process/process_data.sh with setting several parameter:
We list the important parameters here.
- json-file: set the path of your data file here, supporting parquet and jsonl files now.
- file-type: set parquet or jsonl.
- json-key: set the text key in your data file.
- tokenizer-model: set the tokenizer path.
- tokenizer-type: set your tokenizer type, set "HFTokenizer" if you adopt a specific tokenizer from huggingface, more information refers to megatron/tokenizer/tokenizer.py.
- output-prefix: set your processed data file.
- group-size: set your processed text length.
train the bi-directional LLM by running bash scripts/training/pretrain.sh with setting several parameters:
Most parameters are the same as those in official Megatron-LM and Megatron-Deepspeed codebase, we list the additional parameters to support bi-directional training here.

Training related: different ways to train the bi-directional LLM.
- has-sentence-split: include this if you need split the sentence to perform as conditional training.
- has-attention-masking: include this if you need set the specific attention mask to prevent the source sequence attending to the target sequence.
- masked-x-type: set your masking type for the source sequence, more information refers to megatron/data/gebert_dataset.py.
Length related: NAR models usually need the know the target length during inference.
- length-predict: include this if you need predict the target length.
- max-predict-length: set the maximum predicted target length.
- length-factor: set the length loss factor.
- load-LP-module: include this if you need load the length prediction module from a pretrained model
DPO related: we support DPO methods to optimize the decoding path preference
- dpo-training: include this if you need DPO training.
- dpo-update-model-step: set the steps to update the reference model, i.e., similar to iterative DPO methods.
- dpo-sampling-type: set the way to sample DPO pairs.
- dpo-type: set the DPO training type, more information refers to pretrain_gebert.py.
evaluate your model:
(1) reasoning tasks with lm-evaluation-harness by running bash scripts/evaluate/eval_harness/evaluate_all.sh.
Most parameters are the same as those in official lm-evaluation-harness codebase, we list our additional parameters here.
- has-attention-masking: iinclude this if you set the specific attention mask during training.
- inftype: set the inference type, more information refers to eval_harness_all.py
(2) language generation tasks with Mask-Predict decoding algorithm by running bash scripts/evaluate/eval_generation/generation_scripts/eval_generation_finetune.sh.
We list the important parameters here.
- has-attention-masking: include this if you set the specific attention mask during training.
- inftype: set the inference type, more information refers to evaluate_gebert_generation.py.
- max-iter: set your decoding steps.
- length-beam: set the length beam number.
- position-beam: set the position beam number if you adopt position beam search method.
- tokens-beam: set the tokens beam number if you adopt tokens beam search method.

Note:

This project is in progress，feel free to contact us for further improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
apex		apex
dataset		dataset
docs		docs
examples		examples
examples_deepspeed		examples_deepspeed
images		images
megatron		megatron
scripts		scripts
tasks		tasks
tests		tests
tools		tools
.coveragerc		.coveragerc
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
SECURITY.md		SECURITY.md
convert_megatron_to_hf_ckp.py		convert_megatron_to_hf_ckp.py
convert_megatron_to_hf_ckp.sh		convert_megatron_to_hf_ckp.sh
eval_harness_all.py		eval_harness_all.py
evaluate_gebert_generation.py		evaluate_gebert_generation.py
pretrain_gebert.py		pretrain_gebert.py
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment

Try to run quickly

Training related: different ways to train the bi-directional LLM.

Length related: NAR models usually need the know the target length during inference.

DPO related: we support DPO methods to optimize the decoding path preference

Note:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Environment

Try to run quickly

Training related: different ways to train the bi-directional LLM.

Length related: NAR models usually need the know the target length during inference.

DPO related: we support DPO methods to optimize the decoding path preference

Note:

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages