mirror of
https://github.com/wassname/Open-Assistant.git
synced 2026-07-02 17:00:28 +08:00
118 lines
2.9 KiB
Markdown
118 lines
2.9 KiB
Markdown
# Train using supervised examples
|
|
|
|
## Requirements
|
|
|
|
`pip install -r requirements.txt`
|
|
|
|
Start training SFT model
|
|
|
|
```bash
|
|
python trainer.py --configs defaults galactica-125m
|
|
```
|
|
|
|
For `wandb`: update the `entity` argument in `trainer.py`'s call to `wandb.init`
|
|
to be your weights and biases username per
|
|
[docs](https://docs.wandb.ai/ref/python/init).
|
|
|
|
## Dataset choices
|
|
|
|
To specify which translation pair for
|
|
[WMT](https://huggingface.co/datasets/wmt19) and
|
|
[TED Talk](https://huggingface.co/datasets/ted_talks_iwslt) translation simply
|
|
add the supported language pair at the postfix
|
|
|
|
```
|
|
datasets:
|
|
- wmt2019_zh-en
|
|
- wmt2019_ru-en
|
|
- wmt2019_de-en
|
|
- ted_trans_nl-en
|
|
- ted_trans_de-ja
|
|
```
|
|
|
|
Currently only these languages are supported via prompt translation:
|
|
|
|
```
|
|
ar,de,fr,en,it,nl,tr,ru,ms,ko,ja,zh
|
|
```
|
|
|
|
## Dataset sub-sampling
|
|
|
|
We can subsample the **training** data by passing either the `fraction` or
|
|
`size` argument in the `configs/config.yml` file. Don't forget the additional
|
|
colon ":" after the dataset name when doing this.
|
|
|
|
Example:
|
|
|
|
```
|
|
datasets:
|
|
- webgpt:
|
|
fraction : 0.05
|
|
- prompt_dialogue:
|
|
size : 500
|
|
- adversarial_qa
|
|
- trivia_qa_nocontext
|
|
```
|
|
|
|
In this example, per epoch we will use:
|
|
|
|
- A random 5% of `webgpt`;
|
|
- A random 500 examples from `prompt_dialogue`;
|
|
- All examples from datasets for which we don't specify the `fraction` or `size`
|
|
argument.
|
|
|
|
In the above example, per epoch we'll use a different 5% from `webgpt` and a
|
|
different 500 examples from `prompt_dialogue`.
|
|
|
|
This works with `torch.distributed`.
|
|
|
|
## Model
|
|
|
|
Normally you should be able to add new models in `configs/config.yml`
|
|
|
|
```
|
|
your-model-name:
|
|
learning_rate: 2e-6
|
|
model_name: <huggingface model name>
|
|
weight_decay: 0.01
|
|
max_length: 812
|
|
warmup_steps: 600
|
|
gradient_checkpointing: false
|
|
gradient_accumulation_steps: 5
|
|
per_device_train_batch_size: 4
|
|
per_device_eval_batch_size: 4
|
|
```
|
|
|
|
```
|
|
python trainer.py --configs defaults your-model-name
|
|
```
|
|
|
|
However, if the model of your choice doesn't have `pad_token`, `eos_token`,
|
|
`sep_token`, you have to update `get_tokenizer` in `utils.py` to use the right
|
|
token.
|
|
|
|
## Deepspeed support
|
|
|
|
You can edit the configs/zero_config.json and use any stage you wish. The
|
|
current config uses zero-stage 3. For more details on how to setup the config
|
|
checkout [this page](https://www.deepspeed.ai/tutorials/zero/).
|
|
|
|
Once you are satisfy with your deepzero config, you can add --deepspeed flag at
|
|
the end to trigger deepspeed
|
|
|
|
```
|
|
python trainer.py --configs defaults your-model-name --deepspeed
|
|
```
|
|
|
|
## Results
|
|
|
|
Experimental results in wandb
|
|
[here](https://wandb.ai/sanagnos/supervised-finetuning?workspace=user-sanagnos).
|
|
|
|
## TODOS
|
|
|
|
- Decide on a model
|
|
- Merge utils etc with reward model
|
|
- Casual Modelling for GPT-JT does not leverage the bidirectional mask for the
|
|
prompt? (https://huggingface.co/togethercomputer/GPT-JT-6B-v1)
|