* Add files * Add checkpointing * Add checkpointing to SFT * Add loss type * Fix setup| * Clean SFT * Add lora config * Rename config * Remove max eval samples * Add kwargs tp push to hub * Add DPO configs * Fix dpo configs * Extend chat template test to multi-turn * Add warmup * Refactor * Fix LoRA -> QLoRA * Fix configs * Specify chat template * Add sample logging * Fix push to hub hanging * Add reentrant * Fix quality * Add transformer logging * Tweak grad acc * Add null type * Add doc
2.5 KiB
Instructions to Replicate Zephyr-7b-β
As described in the Zephyr technical report, training this model proceeds in two steps:
- Apply SFT to fine-tune Mistral 7B on a filtered version of the UltraChat dataset (link). The result is an SFT model like
zephyr-7b-sft-fullorzephyr-7b-sft-qlora. - Align the SFT model to AI feedback via DPO on a preprocessed version of the UltraFeedback dataset (link). The result is an DPO model like
zephyr-7b-dpo-fullorzephyr-7b-dpo-qlora.
Note: after the release of Zephyr, the team at Argilla found that the source UltraFeedback dataset had a few thousand incorrect preference labels from GPT-4. Additionally, TRL's SFTTrainer had a bug in the learning rate scheduler which terminated training early. Accounting for these changes led us to find a better set of hyperparameters from those described in the technical report. In particular, for DPO training we found that training for 1 epoch with beta=0.01 was suffucient to achieve comparable performance to zephyr-7b-beta (vs. 3 epochs with beta=0.1).
See below for commands to train these models using either DeepSpeed ZeRO-3 or LoRA.
Full training examples
You will require 8 GPUs (80GB of VRAM) to train the full model.
# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_full.yaml
# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml
QLoRA training examples
# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_sft.py recipes/zephyr-7b-beta/sft/config_qlora.yaml --load_in_4bit=true
# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/multi_gpu.yaml --num_processes=1 scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_qlora.yaml