From dad051a2f6f5a17b317b5306c65f41d5a07853cc Mon Sep 17 00:00:00 2001 From: Yu Meng Date: Sat, 20 Jul 2024 00:34:31 -0400 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 278dd6f..3597ef9 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ This repository contains the code and released models for our paper [SimPO: Simp ## 🆕 Changelog -- **[2024.07.17]** We released a new SimPO model [gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) by fine-tuning Google's gemma-2 9B model using on-policy [UltraFeedback data](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) annotated by [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1), achieving a **72.4** LC win rate on AlpacaEval 2 (**#[1 on the Leaderboard](https://tatsu-lab.github.io/alpaca_eval/)** 🎉🎉) and a **59.1** win rate on Arena-Hard! Please find the training script [here](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml)! +- **[2024.07.17]** We released a new SimPO model [gemma-2-9b-it-SimPO](https://huggingface.co/princeton-nlp/gemma-2-9b-it-SimPO) by fine-tuning Google's gemma-2 9B model using on-policy [UltraFeedback data](https://huggingface.co/datasets/princeton-nlp/gemma2-ultrafeedback-armorm) annotated by [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1), achieving a **72.4** LC win rate on AlpacaEval 2 (**#[1 on the Leaderboard](https://tatsu-lab.github.io/alpaca_eval/)** 🎉🎉) and a **59.1** win rate on Arena-Hard! Please find the training script [here](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/gemma-2-9b-it-simpo.yaml) and the data generation scripts [here](https://github.com/princeton-nlp/SimPO/tree/main/on_policy_data_gen)! - **[2024.07.08]** We updated our paper ([v2](https://arxiv.org/abs/2405.14734v2)) - Additional baselines (RRHF, SLiC-HF, CPO) - New Llama3-Instruct setting (v0.2) with [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the preference label annotator, yielding a better-performing model, [Llama-3-Instruct-8B-SimPO-v0.2](https://huggingface.co/princeton-nlp/Llama-3-Instruct-8B-SimPO-v0.2), with a **53.7** LC win rate on AlpacaEval 2 and a **36.5** win rate on Arena-Hard ([training script](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/llama-3-8b-instruct-simpo-v2.yaml))!