From 3690ad3b2e0b2338481ee2f95689392b681b908b Mon Sep 17 00:00:00 2001
From: xiamengzhou <mengzhou@princeton.edu>
Date: Wed, 17 Jul 2024 13:37:04 -0400
Subject: [PATCH] Update README.md

---
 README.md | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/README.md b/README.md
index b042795..6a2d75b 100644
--- a/README.md
+++ b/README.md
@@ -5,7 +5,7 @@ This repository contains the code and released models for our paper [SimPO: Simp
 <img src="./SimPO.png" width="1000px"></img>
 
 ## 🆕 Changelog 
-- [2024.07.17] We released a new SimPO model [gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) by fine-tuning Google's gemma-2 9B model using on-policy [UltraFeedback data](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) annotated by [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1), achieving a **72.4** LC win rate on AlpacaEval 2 (**#1 on the Leaderboard**🎉🎉) and a **59.1** win rate on Arena-Hard! Please find the training script ([here](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml))!
+- [2024.07.17] We released a new SimPO model [gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) by fine-tuning Google's gemma-2 9B model using on-policy [UltraFeedback data](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) annotated by [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1), achieving a **72.4** LC win rate on AlpacaEval 2 (**#1 on the Leaderboard**🎉🎉) and a **59.1** win rate on Arena-Hard! Please find the training script [here](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml)!
 - [2024.07.08] We updated our paper ([v2](https://arxiv.org/abs/2405.14734v2))
   - Additional baselines (RRHF, SLiC-HF, CPO) 
   - New Llama3-Instruct setting (v0.2) with [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the preference label annotator, yielding a better-performing model, [Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2), with a **53.7** LC win rate on AlpacaEval 2 and a **36.5** win rate on Arena-Hard ([training script](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/llama-3-8b-instruct-simpo-v2.yaml))!
@@ -78,7 +78,8 @@ We release the following two models that are built on top of the strong [google/
 |  [princeton-nlp/gemma-2-9b-it-DPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-DPO)  |  67.8  |  65.4  |    2016    | 58.9 |    717    | 88.5 |     392    | 72.2 |     624     |
 | [princeton-nlp/gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) |  72.4  |  65.9  |    1833    | 59.1 |    693    | 88.0 |     341    | 72.2 |     441     |
 
-Compared to the llama3 models, we found that the gemma models exhibit significantly less catastrophic forgetting on math tasks (e.g., GSM) and MMLU, despite the ultrafeedback dataset having limited math-related data. This demonstrates that the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) model is more suitable for continued preference optimization.
+- Compared to the llama3 models, we found that the gemma models exhibit significantly less catastrophic forgetting on math tasks (e.g., GSM) and MMLU, despite the ultrafeedback dataset having limited math-related data. This demonstrates that the [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) model is more suitable for continued preference optimization.
+- SimPO and DPO perform comparably across all benchmarks, but SimPO is inherently simpler and less resource-intensive.
 
 
 ### v0.2