Files
alignment-handbook/recipes/zephyr-7b
Lewis Tunstall 33a0ce3afd Add more doc
2023-11-09 13:39:03 +00:00
..
2023-11-09 13:39:03 +00:00
2023-11-09 13:39:03 +00:00

Instructions

In the handbook, for each training step we provide two sets of recipes:

  • Full training on a multi-GPU machine (tested on a 8xA100 node), using slurm to queue jobs.
  • LORA taining on a single consumer 24GB GPU (tested on a RTX 4090)

The full training jobs will scale to a multi-node setting, by adjusting --nodes=1, we advise adjusting the gradient accumulation steps and/or batch size if you want to replicate our results.

Full training examples

SFT

sbatch --job-name=handbook_sft --nodes=1 recipes/launch.slurm zephyr-7b sft full deepspeed_zero3

DPO

sbatch --job-name=handbook_sft --nodes=1 recipes/launch.slurm zephyr-7b sft full deepspeed_zero3

LORA training examples

SFT

# locally on 1 gpu
accelerate launch scripts/run_sft.py recipes/zephyr-7b/sft/config_lora.yaml
# on a cluster
sbatch --job-name=handbook_sft_lora --nodes=1 recipes/launch.slurm zephyr-7b sft lora multi_gpu "--gradient_accumulation_steps=16"

SFT

# locally on 1 gpu
accelerate launch scripts/run_dpo.py recipes/zephyr-7b/dpo/config_lora.yaml
# on a cluster
sbatch --job-name=handbook_dpo_lora --nodes=1 recipes/launch.slurm zephyr-7b dpo lora multi_gpu "--gradient_accumulation_steps=8"