Update README.md

2026-06-27 19:29:20 +08:00 · 2024-07-09 09:12:44 -04:00
parent efe419e587
commit 005072f88c
1 changed files with 0 additions and 1 deletions
@@ -100,7 +100,6 @@ We found that using a strong reward model for annotating preference optimization
 | Llama 3 Instruct 8B KTO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-KTO-v0.2) |  34.1  |  36.4  | 30.9 |
 | Llama 3 Instruct 8B ORPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-ORPO-v0.2) |  34.1  |  32.1  | 27.3 |
 | Llama 3 Instruct 8B R-DPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-RDPO-v0.2) |  38.1  |  33.8  | 28.2 |
-| Llama 3 Instruct 8B SimPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2) |  48.0  |  45.8  | 35.1 |
 | Llama 3 Instruct 8B SimPO v0.2 | [princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2) |  53.7  |  47.5  | 36.5 |

 ### Use our models for inference