From 995fbaf260956839f44e9ec2c4d9db6b50c96211 Mon Sep 17 00:00:00 2001 From: Yu Meng Date: Tue, 9 Jul 2024 14:55:28 -0400 Subject: [PATCH] Update README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 37b4454..e3d3fa4 100644 --- a/README.md +++ b/README.md @@ -7,8 +7,8 @@ This repository contains the code and released models for our paper [SimPO: Simp ## 🆕 Changelog - [2024.07.08] We updated our paper ([v2](https://arxiv.org/abs/2405.14734v2)) - Additional baselines (RRHF, SLiC-HF, CPO) - - New Llama3-Instruct setting (v0.2) with [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the preference label annotator, yielding better-performing models! - - [SimPO trainer](https://github.com/princeton-nlp/SimPO/blob/main/scripts/simpo_trainer.py) update for better reproducibility + - New Llama3-Instruct setting (v0.2) with [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the preference label annotator, yielding better-performing models ([training script](https://github.com/princeton-nlp/SimPO/blob/main/training_configs/llama-3-8b-instruct-simpo-v2.yaml))! + - [SimPO trainer](https://github.com/princeton-nlp/SimPO/blob/main/scripts/simpo_trainer.py) update for better reproducibility. The hyperparameter `gamma` changed to `gamma_beta_ratio` for easier tuning. ## 🔗 Quick Links - [SimPO: Simple Preference Optimization with a Reference-Free Reward](#simple-preference-optimization-simpo)