* Add StarChat2

* Add DPO

* Fix unit test

* Typos

* Typo
This commit is contained in:
lewtun
2024-03-12 17:22:21 +01:00
committed by GitHub
parent ff618a4d13
commit a9b8a50a27
14 changed files with 324 additions and 29 deletions
+21
View File
@@ -0,0 +1,21 @@
# Instructions to StarChat2
Similar to how we trained Zephyr 7B Beta in our [technical report](https://huggingface.co/papers/2310.16944), training this model proceeds in two steps:
1. Apply SFT to fine-tune [StarCoder2 15B](https://huggingface.co/bigcode/starcoder2-15b) on a blend of chat, code, and math datastets. The result is an SFT model like [`starchat2-15b-sft-v0.1`](https://huggingface.co/HuggingFaceH4/starchat2-15b-sft-v0.1).
2. Align the SFT model to AI feedback via DPO on the UltraFeedback and Orca DPO Pairs datasets. The result is a DPO model like [`starchat2-15b-v0.1`](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1).
See below for commands to train these models using DeepSpeed ZeRO-3.
## Full training examples
You will require 8 GPUs (80GB of VRAM) to train the full model - alternatively, you can train on 1 GPU by adjusting `per_device_train_batch_size` and `gradient_accumulation_steps` to keep the global batch size constant. A recipe involving QLoRA will come later 🤗.
```shell
# Step 1 - SFT
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/starchat2-15b/sft/config_v0.1.yaml
# Step 2 - DPO
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/starchat2-15b/dpo/config_v0.1.yaml
```
@@ -0,0 +1,43 @@
# Model arguments
model_name_or_path: HuggingFaceH4/starchat2-15b-sft-v0.1
torch_dtype: bfloat16
# Data training arguments
# For definitions, see: src/h4/training/config.py
dataset_mixer:
HuggingFaceH4/ultrafeedback_binarized: 1.0
HuggingFaceH4/orca_dpo_pairs: 1.0
dataset_splits:
- train_prefs
- test_prefs
preprocessing_num_workers: 12
# DPOTrainer arguments
bf16: true
beta: 0.05
do_eval: true
evaluation_strategy: steps
eval_steps: 100
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: False
hub_model_id: starchat2-15b-dpo-v0.1
learning_rate: 5.0e-7
log_level: info
logging_steps: 10
lr_scheduler_type: cosine
max_length: 1024
max_prompt_length: 512
num_train_epochs: 2
optim: adamw_torch
output_dir: data/starchat2-15b-dpo-v0.1
per_device_train_batch_size: 2
per_device_eval_batch_size: 4
push_to_hub: true
report_to:
- tensorboard
- wandb
save_strategy: "no"
seed: 42
warmup_ratio: 0.1
@@ -0,0 +1,49 @@
# Model arguments
model_name_or_path: bigcode/starcoder2-15b
model_revision: main
torch_dtype: bfloat16
use_flash_attention_2: true
# Data training arguments
chat_template: "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
dataset_mixer:
HuggingFaceH4/airoboros-3.2: 1.0
HuggingFaceH4/Code-Feedback: 1.0
HuggingFaceH4/orca-math-word-problems-200k: 1.0
HuggingFaceH4/SystemChat: 1.0
HuggingFaceH4/capybara: 1.0
dataset_splits:
- train_sft
- test_sft
preprocessing_num_workers: 24
# SFT trainer config
bf16: true
do_eval: true
evaluation_strategy: epoch
gradient_accumulation_steps: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: starchat2-15b-v0.1
hub_strategy: every_save
learning_rate: 2.0e-05
log_level: info
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine
max_seq_length: 2048
max_steps: -1
num_train_epochs: 3
output_dir: data/starchat2-15b-v0.1
overwrite_output_dir: true
per_device_eval_batch_size: 8
per_device_train_batch_size: 8
push_to_hub: true
remove_unused_columns: true
report_to:
- tensorboard
- wandb
save_strategy: "no"
seed: 42
warmup_ratio: 0.1
+1 -1
View File
@@ -39,4 +39,4 @@ report_to:
- wandb
save_strategy: "no"
seed: 42
warmup_ratio: 0.1
warmup_ratio: 0.1
+1 -1
View File
@@ -28,7 +28,7 @@ hub_model_id: zephyr-7b-gemma-sft
hub_strategy: every_save
learning_rate: 2.0e-05
log_level: info
logging_steps: 5
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine
max_seq_length: 2048