Update README.md

This commit is contained in:
xiamengzhou
2024-07-15 09:26:20 -04:00
committed by GitHub
parent 3e9c4cc3bd
commit 219e6c2ac9
+2
View File
@@ -57,6 +57,8 @@ Our released Llama3 models use the initial version of the Llama3 tokenizer (prio
*Notably, if you are training Llama3 and evaluating the trained models on AlpacaEval 2 and Arena-Hard using the templates provided in this repo, please make sure to use the pre-update Llama3 tokenizer (i.e., the one before the PR).*
### Reproducing AlpacaEval 2 numbers
Please make sure that you use `AlpacaEval==0.6.2` for successfully reproducing AlpacaEval 2 results.
### Adding an extra SFT loss
The [CPO_SIMPO](https://github.com/fe1ixxu/CPO_SIMPO/tree/main) repository did preliminary experiments and observed that in some cases, adding an additional SFT loss can help improve results. In our own experiments, the SFT regularization helps preserve the reasoning ability (e.g., GSM8K) but degrades chat performance. If you'd like to apply SFT regularization, you can set `sft_weight` to be a positive value (by default it's 0).