mirror of
https://github.com/wassname/Open-Assistant.git
synced 2026-06-27 16:10:30 +08:00
Sections to train Reward Model (RM)
Trainer code based on huggingface. Compatible with deepspeed or accelerate
Install Python requirements
pip install -r requirements.txt
Write or inherit a configs/<config-name>.yml file to store training
configuration details.
The configuration file must have at least all the keys present in
configs/dummy.yml
Run training procedure
python trainer.py configs/<config-name>.yml
Additional axis labeling, this outputs a 4 summary quality evaluation metrics (score are normalized to 0-1 )
python summary_quality_trainer.py configs/test-bloomz-560m-quality.yml
The four summary are :
-
overall
-
accuracy
-
coverage
-
coherence
Dataset
For now we only supports webgpt and summary dataset from OpenAI. Once open-asisstant dataset are available it will be added here.
Model
Check out configs
Open-Assistant/model/reward/instructor/configs/
bloomz-560m.yml
electra-base-dis-webgpt.yml
galactica-125m.yml
galactica-1b.yml
You can add new huggingface model as you want.