Refactor imports

2026-06-27 19:46:04 +08:00 · 2023-11-10 13:38:45 +00:00
parent 7f1a14e0d4
commit e2e8ab945d
6 changed files with 58 additions and 18 deletions
@@ -1,7 +1,7 @@

-## Scripts to Train and Evaluate Chat Models
+# Scripts to Train and Evaluate Chat Models

-### Fine-tuning
+## Fine-tuning

 In the handbook, we provide three main ways to align LLMs for chat:

@@ -47,7 +47,7 @@ By default all training metrics are logged with TensorBoard. If you have a [Weig
 ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_{task}.py recipes/{model_name}/{task}/config_full.yaml --report_to=wandb
 ```

-### Launching jobs on a Slurm cluster
+## Launching jobs on a Slurm cluster

 If you have access to a Slurm cluster, we provide a `recipes/launch.slurm` script that will automatically queue training jobs for you. Here's how you can use it:

@@ -63,4 +63,39 @@ sbatch --job-name=handbook_sft --nodes=1 recipes/launch.slurm zephyr-7b-beta sft

 You can scale the number of nodes by increasing the `--nodes` flag.

-**⚠️ Note:** the configuration in `recipes/launch.slurm` is optimised for the Hugging Face Compute Cluster and may require tweaking to be adapted to your own compute nodes.
+**⚠️ Note:** the configuration in `recipes/launch.slurm` is optimised for the Hugging Face Compute Cluster and may require tweaking to be adapted to your own compute nodes.
+
+## Fine-tuning on custom datasets
+
+Under the hood, each training script uses the `get_datasets()` function which allows one to easily combing multiple datasets with varying proportions. For instance, this is how one can specify multiple datasets and which splits to combine in one of the YAML configs:
+
+```yaml
+datasets_mixer:
+    dataset_1: 0.5  # Use 50% of the training examples
+    dataset_2: 0.66 # Use 66% of the training examples
+    dataset_3: 0.10 # Use 10% of the training examples
+dataset_splits:
+- train_x           # Samples from each train split
+- test_x            # Test splits aren't sampled
+```
+
+If you want to fine-tune on your own datasets, the main thing to keep in mind is how the chat templates are applied to the dataset blend. Since each task (SFT, DPO, etc), requires a different format, we assume the datasets have the following columns:
+
+**SFT**
+
+* `messages`: A list of `dicts` in the form `{"role": "{role}", "content": {content}}`. 
+* See [ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) for an example.
+
+**DPO**
+
+* `chosen`: A list of `dicts` in the form `{"role": "{role}", "content": {content}}` corresponding to the preferred dialogue.
+* `rejected`: A list of `dicts` in the form `{"role": "{role}", "content": {content}}` corresponding to the dispreferred dialogue.
+* See [ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized) for an example.
+
+We also find it useful to include dedicated splits per task in our datasets, so e.g. we have:
+
+* `{train,test}_sft`: Splits for SFT training.
+* `{train,test}_gen`: Splits for generation ranking like rejection sampling or PPO.
+* `{train,test}_prefs`: Splits for preference modelling, like reward modelling or DPO.
+
+If you format your dataset in the same way, our training scripts should work out of the box!
@@ -18,7 +18,7 @@ import sys

 import torch
 import transformers
-from transformers import set_seed
+from transformers import AutoModelForCausalLM, set_seed

 from accelerate import Accelerator
 from alignment import (
@@ -32,11 +32,11 @@ from alignment import (
    get_peft_config,
    get_quantization_config,
    get_tokenizer,
+    is_adapter_model,
 )
-from trl import DPOTrainer
-from transformers import AutoModelForCausalLM
-from alignment.model_utils import is_adapter_model
 from peft import PeftConfig, PeftModel
+from trl import DPOTrainer
+

 logger = logging.getLogger(__name__)

@@ -114,15 +114,15 @@ def main():
        device_map=get_kbit_device_map(),
        quantization_config=get_quantization_config(model_args),
    )
-    
+
    model = model_args.model_name_or_path
    if is_adapter_model(model, model_args.model_revision):
        # load the model, merge the adapter weights and unload the adapter
        # Note: to run QLora, you will need to merge the based model separately as the merged model in 16bit
        logger.info(f"Merging peft adapters for {model_args.model_name_or_path=}")
-        
+
        peft_config = PeftConfig.from_pretrained(model_args.model_name_or_path, revision=model_args.model_revision)
-        
+
        model_kwargs = dict(
            revision=model_args.base_model_revision,
            trust_remote_code=model_args.trust_remote_code,
@@ -131,9 +131,12 @@ def main():
            use_cache=False if training_args.gradient_checkpointing else True,
        )
        base_model = AutoModelForCausalLM.from_pretrained(
-            peft_config.base_model_name_or_path, **model_kwargs,
+            peft_config.base_model_name_or_path,
+            **model_kwargs,
+        )
+        model = PeftModel.from_pretrained(
+            base_model, model_args.model_name_or_path, revision=model_args.model_revision
        )
-        model = PeftModel.from_pretrained(base_model, model_args.model_name_or_path, revision=model_args.model_revision)
        model.eval()
        model = model.merge_and_unload()
        model_kwargs = None