-
Install docker and nvidia-docker
-
Run the following command
docker pull nvcr.io/nvidia/pytorch:23.12-py3
docker run --gpus all --shm-size=128g --net=host -dit --rm --name megatron -v /your_dir:/your_dir -v /root/.ssh:/root/.ssh nvcr.io/nvidia/pytorch:23.12-py3- Install deepspeed
pip install deepspeed- Install megatron-core packages
download this codebase and get into your own path, then
pip install -e .-
prepare your pretrained data by running
bash scripts/process/process_data.shwith setting several parameter:
We list the important parameters here.json-file: set the path of your data file here, supporting parquet and jsonl files now.file-type: set parquet or jsonl.json-key: set the text key in your data file.tokenizer-model: set the tokenizer path.tokenizer-type: set your tokenizer type, set "HFTokenizer" if you adopt a specific tokenizer from huggingface, more information refers tomegatron/tokenizer/tokenizer.py.output-prefix: set your processed data file.group-size: set your processed text length.
-
train the bi-directional LLM by running
bash scripts/training/pretrain.shwith setting several parameters:
Most parameters are the same as those in officialMegatron-LMandMegatron-Deepspeedcodebase, we list the additional parameters to support bi-directional training here.has-sentence-split: include this if you need split the sentence to perform as conditional training.has-attention-masking: include this if you need set the specific attention mask to prevent the source sequence attending to the target sequence.masked-x-type: set your masking type for the source sequence, more information refers tomegatron/data/gebert_dataset.py.
length-predict: include this if you need predict the target length.max-predict-length: set the maximum predicted target length.length-factor: set the length loss factor.load-LP-module: include this if you need load the length prediction module from a pretrained model
dpo-training: include this if you need DPO training.dpo-update-model-step: set the steps to update the reference model, i.e., similar to iterative DPO methods.dpo-sampling-type: set the way to sample DPO pairs.dpo-type: set the DPO training type, more information refers topretrain_gebert.py.
-
evaluate your model:
(1) reasoning tasks withlm-evaluation-harnessby runningbash scripts/evaluate/eval_harness/evaluate_all.sh.
Most parameters are the same as those in officiallm-evaluation-harnesscodebase, we list our additional parameters here.has-attention-masking: iinclude this if you set the specific attention mask during training.inftype: set the inference type, more information refers toeval_harness_all.py
(2) language generation tasks with
Mask-Predictdecoding algorithm by runningbash scripts/evaluate/eval_generation/generation_scripts/eval_generation_finetune.sh.
We list the important parameters here.has-attention-masking: include this if you set the specific attention mask during training.inftype: set the inference type, more information refers toevaluate_gebert_generation.py.max-iter: set your decoding steps.length-beam: set the length beam number.position-beam: set the position beam number if you adopt position beam search method.tokens-beam: set the tokens beam number if you adopt tokens beam search method.
This project is in progress,feel free to contact us for further improvements.