mirror of
https://github.com/wassname/alignment-handbook.git
synced 2026-06-27 16:14:07 +08:00
@@ -19,6 +19,7 @@ However, we know from the [InstructGPT](https://huggingface.co/papers/2203.02155
|
||||
The Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.
|
||||
|
||||
## News 🗞️
|
||||
* **November 21, 2024**: We release the [recipe](recipes/smollm2/README.md) for finet-uning SmolLM2-Instruct.
|
||||
* **August 18, 2024**: We release SmolLM-Instruct v0.2, along with the [recipe](recipes/smollm/README.md) to fine-tuning small LLMs 💻
|
||||
* **April 12, 2024**: We release Zephyr 141B (A35B), in collaboration with Argilla and Kaist AI, along with the recipe to fine-tune Mixtral 8x22B with ORPO 🪁
|
||||
* **March 12, 2024:** We release StarChat2 15B, along with the recipe to train capable coding assistants 🌟
|
||||
|
||||
@@ -0,0 +1,28 @@
|
||||
|
||||
# Instructions to train SmolLM2-1.7B-Instruct
|
||||
|
||||
We build the [SmolLM2-Instruct](https://huggingface.co/collections/HuggingFaceTB/smollm2-6723884218bcda64b34d7db9) by doing SFT on [SmolTalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk) and then DPO on [UltraFeedBack](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized).
|
||||
|
||||
## Setup
|
||||
|
||||
Follow the installation instructions in https://github.com/huggingface/alignment-handbook/tree/main?tab=readme-ov-file#installation-instructions
|
||||
|
||||
## Training
|
||||
We train the 1.7B on 8 GPUs using the following command:
|
||||
|
||||
```shell
|
||||
# SFT
|
||||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config.yaml
|
||||
|
||||
# DPO
|
||||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config.yaml
|
||||
```
|
||||
|
||||
For the 135M and 360M we use [smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) dataset for SFT and UltraFeedback for DPO:
|
||||
```shell
|
||||
# SFT
|
||||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_sft.py recipes/smollm2/sft/config_smol.yaml
|
||||
|
||||
# DPO
|
||||
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/smollm2/dpo/config_smol.yaml
|
||||
```
|
||||
@@ -0,0 +1,43 @@
|
||||
# Model arguments
|
||||
model_name_or_path: loubnabnl/smollm2-1.7B-sft
|
||||
torch_dtype: bfloat16
|
||||
|
||||
# Data training arguments
|
||||
dataset_mixer:
|
||||
HuggingFaceH4/ultrafeedback_binarized: 1.0
|
||||
|
||||
dataset_splits:
|
||||
- train_prefs
|
||||
- test_prefs
|
||||
preprocessing_num_workers: 12
|
||||
|
||||
# DPOTrainer arguments
|
||||
bf16: true
|
||||
beta: 0.5
|
||||
do_eval: true
|
||||
hub_private_repo: true
|
||||
eval_strategy: steps
|
||||
eval_steps: 100
|
||||
gradient_accumulation_steps: 8
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: False
|
||||
hub_model_id: smollm2-1.7B-dpo
|
||||
learning_rate: 1.0e-6
|
||||
log_level: info
|
||||
logging_steps: 10
|
||||
lr_scheduler_type: cosine
|
||||
max_length: 1024
|
||||
max_prompt_length: 512
|
||||
num_train_epochs: 3
|
||||
optim: adamw_torch
|
||||
output_dir: data/smollm2-1.7B-dpo
|
||||
per_device_train_batch_size: 2
|
||||
per_device_eval_batch_size: 4
|
||||
push_to_hub: true
|
||||
report_to:
|
||||
- tensorboard
|
||||
- wandb
|
||||
save_strategy: "no"
|
||||
seed: 42
|
||||
warmup_ratio: 0.1
|
||||
@@ -0,0 +1,43 @@
|
||||
# Model arguments
|
||||
model_name_or_path: loubnabnl/smollm2-360M-sft # we use this script for the 135M model too
|
||||
torch_dtype: bfloat16
|
||||
|
||||
# Data training arguments
|
||||
dataset_mixer:
|
||||
HuggingFaceH4/ultrafeedback_binarized: 1.0
|
||||
|
||||
dataset_splits:
|
||||
- train_prefs
|
||||
- test_prefs
|
||||
preprocessing_num_workers: 12
|
||||
|
||||
# DPOTrainer arguments
|
||||
bf16: true
|
||||
beta: 0.5
|
||||
do_eval: true
|
||||
hub_private_repo: true
|
||||
eval_strategy: steps
|
||||
eval_steps: 100
|
||||
gradient_accumulation_steps: 8
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: False
|
||||
hub_model_id: smollm2-360M-dpo
|
||||
learning_rate: 1.0e-6
|
||||
log_level: info
|
||||
logging_steps: 10
|
||||
lr_scheduler_type: cosine
|
||||
max_length: 1024
|
||||
max_prompt_length: 512
|
||||
num_train_epochs: 2
|
||||
optim: adamw_torch
|
||||
output_dir: data/smollm2-360M-dpo
|
||||
per_device_train_batch_size: 2
|
||||
per_device_eval_batch_size: 4
|
||||
push_to_hub: true
|
||||
report_to:
|
||||
- tensorboard
|
||||
- wandb
|
||||
save_strategy: "no"
|
||||
seed: 42
|
||||
warmup_ratio: 0.1
|
||||
@@ -0,0 +1,49 @@
|
||||
# Model arguments
|
||||
model_name_or_path: HuggingFaceTB/SmolLM2-1.7B
|
||||
model_revision: main
|
||||
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-1.7B-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens
|
||||
torch_dtype: bfloat16
|
||||
use_flash_attention_2: true
|
||||
|
||||
# Data training arguments
|
||||
dataset_mixer:
|
||||
HuggingFaceTB/smoltalk: 1.0
|
||||
|
||||
dataset_configs:
|
||||
- all
|
||||
|
||||
dataset_splits:
|
||||
- train
|
||||
- test
|
||||
preprocessing_num_workers: 36
|
||||
|
||||
# SFT trainer config
|
||||
bf16: true
|
||||
do_eval: true
|
||||
evaluation_strategy: epoch
|
||||
gradient_accumulation_steps: 4
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: false
|
||||
hub_model_id: smollm2-1.7B-sft
|
||||
hub_strategy: every_save
|
||||
learning_rate: 3.0e-04
|
||||
log_level: info
|
||||
logging_steps: 5
|
||||
logging_strategy: steps
|
||||
lr_scheduler_type: cosine
|
||||
max_seq_length: 8192
|
||||
max_steps: -1
|
||||
num_train_epochs: 2
|
||||
output_dir: data/smollm2-1.7B-sft
|
||||
overwrite_output_dir: true
|
||||
per_device_eval_batch_size: 4
|
||||
per_device_train_batch_size: 4
|
||||
push_to_hub: true
|
||||
remove_unused_columns: true
|
||||
report_to:
|
||||
- tensorboard
|
||||
- wandb
|
||||
save_strategy: "no"
|
||||
seed: 42
|
||||
warmup_ratio: 0.1
|
||||
@@ -0,0 +1,46 @@
|
||||
# Model arguments
|
||||
model_name_or_path: HuggingFaceTB/SmolLM2-360M # we use this script for the 135M model too
|
||||
model_revision: main
|
||||
tokenizer_name_or_path: HuggingFaceTB/SmolLM2-360M-Instruct # Custom tokenizer with <|im_start|> and <|im_end|> tokens
|
||||
torch_dtype: bfloat16
|
||||
use_flash_attention_2: true
|
||||
|
||||
# Data training arguments
|
||||
dataset_mixer:
|
||||
HuggingFaceTB/smol-smoltalk: 1.0
|
||||
|
||||
dataset_splits:
|
||||
- train
|
||||
- test
|
||||
preprocessing_num_workers: 36
|
||||
|
||||
# SFT trainer config
|
||||
bf16: true
|
||||
do_eval: true
|
||||
evaluation_strategy: epoch
|
||||
gradient_accumulation_steps: 4
|
||||
gradient_checkpointing: true
|
||||
gradient_checkpointing_kwargs:
|
||||
use_reentrant: false
|
||||
hub_model_id: smollm2-360M-sft
|
||||
hub_strategy: every_save
|
||||
learning_rate: 1.0e-03 # 3e-4
|
||||
log_level: info
|
||||
logging_steps: 5
|
||||
logging_strategy: steps
|
||||
lr_scheduler_type: cosine
|
||||
max_seq_length: 8192
|
||||
max_steps: -1
|
||||
num_train_epochs: 2
|
||||
output_dir: data/smollm2-360M-sft
|
||||
overwrite_output_dir: true
|
||||
per_device_eval_batch_size: 4
|
||||
per_device_train_batch_size: 4
|
||||
push_to_hub: true
|
||||
remove_unused_columns: true
|
||||
report_to:
|
||||
- tensorboard
|
||||
- wandb
|
||||
save_strategy: "no"
|
||||
seed: 42
|
||||
warmup_ratio: 0.1
|
||||
Reference in New Issue
Block a user