Add tests and update docstring to tokenizer matching
Train using supervised examples
Requirements
pip install -r requirements.txt
Start training SFT model
python trainer.py --configs defaults galactica-125m
For wandb: update the entity argument in trainer.py's call to wandb.init
to be your weights and biases username per
docs.
Dataset choices
To specify which translation pair for WMT and TED Talk translation simply add the supported language pair at the postfix
datasets:
- wmt2019_zh-en
- wmt2019_ru-en
- wmt2019_de-en
- ted_trans_nl-en
- ted_trans_de-ja
Currently only these languages are supported via prompt translation:
ar,de,fr,en,it,nl,tr,ru,ms,ko,ja,zh
Dataset sub-sampling
We can subsample the training data by passing either the fraction or
size argument in the configs/config.yml file. Don't forget the additional
colon ":" after the dataset name when doing this.
Example:
datasets:
- webgpt:
fraction : 0.05
- prompt_dialogue:
size : 500
- adversarial_qa
- trivia_qa_nocontext
In this example, per epoch we will use:
- A random 5% of
webgpt; - A random 500 examples from
prompt_dialogue; - All examples from datasets for which we don't specify the
fractionorsizeargument.
In the above example, per epoch we'll use a different 5% from webgpt and a
different 500 examples from prompt_dialogue.
This works with torch.distributed.
Model
Normally you should be able to add new models in configs/config.yml
your-model-name:
learning_rate: 2e-6
model_name: <huggingface model name>
weight_decay: 0.01
max_length: 812
warmup_steps: 600
gradient_checkpointing: false
gradient_accumulation_steps: 5
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
python trainer.py --configs defaults your-model-name
However, if the model of your choice doesn't have pad_token, eos_token,
sep_token, you have to update get_tokenizer in utils.py to use the right
token.
Deepspeed support
You can edit the configs/zero_config.json and use any stage you wish. The current config uses zero-stage 3. For more details on how to setup the config checkout this page.
Once you are satisfy with your deepzero config, you can add --deepspeed flag at the end to trigger deepspeed
python trainer.py --configs defaults your-model-name --deepspeed
Results
Experimental results in wandb here.
TODOS
- Decide on a model
- Merge utils etc with reward model
- Casual Modelling for GPT-JT does not leverage the bidirectional mask for the prompt? (https://huggingface.co/togethercomputer/GPT-JT-6B-v1)