Update README.md

This commit is contained in:
Yu Meng
2024-07-09 14:29:01 -04:00
committed by GitHub
parent 2d5cbf1dad
commit 3235adf17a
+11 -11
View File
@@ -62,7 +62,7 @@ models |
| Mistral Base 7B SLiC-HF | [princeton-nlp/Mistral-7B-Base-SFT-SLiC-HF](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-SLiC-HF) | 10.9 | 8.9 | 7.3 |
| Mistral Base 7B DPO (Zephyr) | [princeton-nlp/Mistral-7B-Base-SFT-DPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-DPO) | 15.1 | 12.5 | 10.4 |
| Mistral Base 7B IPO | [princeton-nlp/Mistral-7B-Base-SFT-IPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-IPO) | 11.8 | 9.4 | 7.5 |
| Mistral Base 7B CPO | [princeton-nlp/Mistral-7B-Base-SFT-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-CPO) | 13.1 | 9.1 | 5.6 |
| Mistral Base 7B CPO | [princeton-nlp/Mistral-7B-Base-SFT-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-CPO) | 9.8 | 8.9 | 6.9 |
| Mistral Base 7B KTO | [princeton-nlp/Mistral-7B-Base-SFT-KTO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-KTO) | 13.1 | 9.1 | 5.6 |
| Mistral Base 7B ORPO | [kaist-ai/mistral-orpo-beta](https://huggingface.co/kaist-ai/mistral-orpo-beta) | 14.7 | 12.2 | 7.0 |
| Mistral Base 7B R-DPO | [princeton-nlp/Mistral-7B-Base-SFT-RDPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-RDPO) | 17.4 | 12.8 | 9.9 |
@@ -72,7 +72,7 @@ models |
| Mistral Instruct 7B SLiC-HF | [princeton-nlp/Mistral-7B-Instruct-SLiC-HF](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-SLiC-HF) | 24.1 | 24.6 | 18.9 |
| Mistral Instruct 7B DPO | [princeton-nlp/Mistral-7B-Instruct-DPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-DPO) | 26.8 | 24.9 | 16.3 |
| Mistral Instruct 7B IPO | [princeton-nlp/Mistral-7B-Instruct-IPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-IPO) | 20.3 | 20.3 | 16.2 |
| Mistral Instruct 7B CPO | [princeton-nlp/Mistral-7B-Instruct-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-CPO) | 24.5 | 23.6 | 17.9 |
| Mistral Instruct 7B CPO | [princeton-nlp/Mistral-7B-Instruct-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-CPO) | 23.8 | 28.8 | 22.6 |
| Mistral Instruct 7B KTO | [princeton-nlp/Mistral-7B-Instruct-KTO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-KTO) | 24.5 | 23.6 | 17.9 |
| Mistral Instruct 7B ORPO | [princeton-nlp/Mistral-7B-Instruct-ORPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-ORPO) | 24.5 | 24.9 | 20.8 |
| Mistral Instruct 7B R-DPO | [princeton-nlp/Mistral-7B-Instruct-RDPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-RDPO) | 27.3 | 24.5 | 16.1 |
@@ -82,7 +82,7 @@ models |
| Llama3 Base 8B SLiC-HF | [princeton-nlp/Llama-3-Base-8B-SLiC-HF](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SLiC-HF) | 12.1 | 10.1 | 10.3 |
| Llama3 Base 8B DPO | [princeton-nlp/Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO) | 18.2 | 15.5 | 15.9 |
| Llama3 Base 8B IPO | [princeton-nlp/Llama-3-Base-8B-SFT-IPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-IPO) | 14.4 | 14.2 | 17.8 |
| Llama3 Base 8B CPO | [princeton-nlp/Llama-3-Base-8B-SFT-CPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-CPO) | 14.4 | 14.2 | 17.8 |
| Llama3 Base 8B CPO | [princeton-nlp/Llama-3-Base-8B-SFT-CPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-CPO) | 10.8 | 8.1 | 5.8 |
| Llama3 Base 8B KTO | [princeton-nlp/Llama-3-Base-8B-SFT-KTO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-KTO) | 14.2 | 12.4 | 12.5 |
| Llama3 Base 8B ORPO | [princeton-nlp/Llama-3-Base-8B-SFT-ORPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-ORPO) | 12.2 | 10.6 | 10.8 |
| Llama3 Base 8B R-DPO | [princeton-nlp/Llama-3-Base-8B-SFT-RDPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-RDPO) | 17.6 | 14.4 | 17.2 |
@@ -103,14 +103,14 @@ We found that using a strong reward model for annotating preference optimization
| models | | AE2 LC | AE2 WR | AH |
|------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|:----:|
| Llama 3 Instruct 8B RRHF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RRHF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RRHF-v0.2) | 26.0 | 25.3 | 22.3 |
| Llama 3 Instruct 8B SLiC-HF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v0.2) | 37.9 | 31.6 | 28.8 |
| Llama 3 Instruct 8B DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2) | 33.9 | 32.5 | 29.3 |
| Llama 3 Instruct 8B IPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2) | 48.2 | 47.5 | 35.2 |
| Llama 3 Instruct 8B CPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2) | 46.8 | 42.4 | 36.6 |
| Llama 3 Instruct 8B KTO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2) | 34.1 | 36.4 | 30.9 |
| Llama 3 Instruct 8B ORPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2) | 34.1 | 32.1 | 27.3 |
| Llama 3 Instruct 8B R-DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2) | 38.1 | 33.8 | 28.2 |
| Llama 3 Instruct 8B RRHF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RRHF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RRHF-v0.2) | 37.9 | 31.6 | 28.8 |
| Llama 3 Instruct 8B SLiC-HF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v0.2) | 33.9 | 32.5 | 29.3 |
| Llama 3 Instruct 8B DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2) | 48.2 | 47.5 | 35.2 |
| Llama 3 Instruct 8B IPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2) | 46.8 | 42.4 | 36.6 |
| Llama 3 Instruct 8B CPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2) | 34.1 | 36.4 | 30.9 |
| Llama 3 Instruct 8B KTO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2) | 34.1 | 32.1 | 27.3 |
| Llama 3 Instruct 8B ORPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2) | 38.1 | 33.8 | 28.2 |
| Llama 3 Instruct 8B R-DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2) | 48.0 | 45.8 | 35.1 |
| Llama 3 Instruct 8B SimPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2) | 53.7 | 47.5 | 36.5 |
### Use our models for inference