Update README.md

2026-06-27 19:29:20 +08:00 · 2024-07-09 14:29:01 -04:00
parent 2d5cbf1dad
commit 3235adf17a
1 changed files with 11 additions and 11 deletions
@@ -62,7 +62,7 @@ models                       |
 | Mistral Base 7B SLiC-HF      | [princeton-nlp/Mistral-7B-Base-SFT-SLiC-HF](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-SLiC-HF) |  10.9  |   8.9  |  7.3 |
 | Mistral Base 7B DPO (Zephyr) | [princeton-nlp/Mistral-7B-Base-SFT-DPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-DPO)     |  15.1  |  12.5  | 10.4 |
 | Mistral Base 7B IPO          | [princeton-nlp/Mistral-7B-Base-SFT-IPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-IPO)     |  11.8  |   9.4  |  7.5 |
-| Mistral Base 7B CPO          | [princeton-nlp/Mistral-7B-Base-SFT-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-CPO)     |  13.1  |   9.1  |  5.6 |
+| Mistral Base 7B CPO          | [princeton-nlp/Mistral-7B-Base-SFT-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-CPO)     |  9.8  |   8.9  |  6.9 |
 | Mistral Base 7B KTO          | [princeton-nlp/Mistral-7B-Base-SFT-KTO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-KTO)     |  13.1  |   9.1  |  5.6 |
 | Mistral Base 7B ORPO         | [kaist-ai/mistral-orpo-beta](https://huggingface.co/kaist-ai/mistral-orpo-beta)                           |  14.7  |  12.2  |  7.0 |
 | Mistral Base 7B R-DPO        | [princeton-nlp/Mistral-7B-Base-SFT-RDPO](https://huggingface.co/princeton-nlp/Mistral-7B-Base-SFT-RDPO)   |  17.4  |  12.8  |  9.9 |
@@ -72,7 +72,7 @@ models                       |
 | Mistral Instruct 7B SLiC-HF  | [princeton-nlp/Mistral-7B-Instruct-SLiC-HF](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-SLiC-HF) |  24.1  |  24.6  | 18.9 |
 | Mistral Instruct 7B DPO      | [princeton-nlp/Mistral-7B-Instruct-DPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-DPO)     |  26.8  |  24.9  | 16.3 |
 | Mistral Instruct 7B IPO      | [princeton-nlp/Mistral-7B-Instruct-IPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-IPO)     |  20.3  |  20.3  | 16.2 |
-| Mistral Instruct 7B CPO      | [princeton-nlp/Mistral-7B-Instruct-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-CPO)     |  24.5  |  23.6  | 17.9 |
+| Mistral Instruct 7B CPO      | [princeton-nlp/Mistral-7B-Instruct-CPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-CPO)     |  23.8  |  28.8  | 22.6 |
 | Mistral Instruct 7B KTO      | [princeton-nlp/Mistral-7B-Instruct-KTO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-KTO)     |  24.5  |  23.6  | 17.9 |
 | Mistral Instruct 7B ORPO     | [princeton-nlp/Mistral-7B-Instruct-ORPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-ORPO)   |  24.5  |  24.9  | 20.8 |
 | Mistral Instruct 7B R-DPO    | [princeton-nlp/Mistral-7B-Instruct-RDPO](https://huggingface.co/princeton-nlp/Mistral-7B-Instruct-RDPO)   |  27.3  |  24.5  | 16.1 |
@@ -82,7 +82,7 @@ models                       |
 | Llama3 Base 8B SLiC-HF       | [princeton-nlp/Llama-3-Base-8B-SLiC-HF](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SLiC-HF)     |  12.1  |  10.1  | 10.3 |
 | Llama3 Base 8B DPO           | [princeton-nlp/Llama-3-Base-8B-SFT-DPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-DPO)     |  18.2  |  15.5  | 15.9 |
 | Llama3 Base 8B IPO           | [princeton-nlp/Llama-3-Base-8B-SFT-IPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-IPO)     |  14.4  |  14.2  | 17.8 |
-| Llama3 Base 8B CPO           | [princeton-nlp/Llama-3-Base-8B-SFT-CPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-CPO)     |  14.4  |  14.2  | 17.8 |
+| Llama3 Base 8B CPO           | [princeton-nlp/Llama-3-Base-8B-SFT-CPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-CPO)     |  10.8  |  8.1  | 5.8 |
 | Llama3 Base 8B KTO           | [princeton-nlp/Llama-3-Base-8B-SFT-KTO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-KTO)     |  14.2  |  12.4  | 12.5 |
 | Llama3 Base 8B ORPO          | [princeton-nlp/Llama-3-Base-8B-SFT-ORPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-ORPO)   |  12.2  |  10.6  | 10.8 |
 | Llama3 Base 8B R-DPO         | [princeton-nlp/Llama-3-Base-8B-SFT-RDPO](https://huggingface.co/princeton-nlp/Llama-3-Base-8B-SFT-RDPO)   |  17.6  |  14.4  | 17.2 |
@@ -103,14 +103,14 @@ We found that using a strong reward model for annotating preference optimization

 | models                       |                                                                                                           | AE2 LC | AE2 WR |  AH  |
 |------------------------------|-----------------------------------------------------------------------------------------------------------|:------:|:------:|:----:|
-| Llama 3 Instruct 8B RRHF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RRHF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RRHF-v0.2) |  26.0  |  25.3  | 22.3 |
-| Llama 3 Instruct 8B SLiC-HF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v0.2) |  37.9  |  31.6  | 28.8 |
-| Llama 3 Instruct 8B DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2) |  33.9  |  32.5  | 29.3 |
-| Llama 3 Instruct 8B IPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2) |  48.2  |  47.5  | 35.2 |
-| Llama 3 Instruct 8B CPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2) |  46.8  |  42.4  | 36.6 |
-| Llama 3 Instruct 8B KTO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2) |  34.1  |  36.4  | 30.9 |
-| Llama 3 Instruct 8B ORPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2) |  34.1  |  32.1  | 27.3 |
-| Llama 3 Instruct 8B R-DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2) |  38.1  |  33.8  | 28.2 |
+| Llama 3 Instruct 8B RRHF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RRHF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RRHF-v0.2) |  37.9  |  31.6  | 28.8 |
+| Llama 3 Instruct 8B SLiC-HF v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v2.0](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SLiC-HF-v0.2) |  33.9  |  32.5  | 29.3 |
+| Llama 3 Instruct 8B DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2) |  48.2  |  47.5  | 35.2 |
+| Llama 3 Instruct 8B IPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-IPO-v0.2) |  46.8  |  42.4  | 36.6 |
+| Llama 3 Instruct 8B CPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-CPO-v0.2) |  34.1  |  36.4  | 30.9 |
+| Llama 3 Instruct 8B KTO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2) |  34.1  |  32.1  | 27.3 |
+| Llama 3 Instruct 8B ORPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2) |  38.1  |  33.8  | 28.2 |
+| Llama 3 Instruct 8B R-DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2) |  48.0  | 45.8  | 35.1 |
 | Llama 3 Instruct 8B SimPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2) |  53.7  |  47.5  | 36.5 |

 ### Use our models for inference