diff --git a/README.md b/README.md index 5413cfb..37b4454 100644 --- a/README.md +++ b/README.md @@ -4,8 +4,15 @@ This repository contains the code and released models for our paper [SimPO: Simp +## 🆕 Changelog +- [2024.07.08] We updated our paper ([v2](https://arxiv.org/abs/2405.14734v2)) + - Additional baselines (RRHF, SLiC-HF, CPO) + - New Llama3-Instruct setting (v0.2) with [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1) as the preference label annotator, yielding better-performing models! + - [SimPO trainer](https://github.com/princeton-nlp/SimPO/blob/main/scripts/simpo_trainer.py) update for better reproducibility + ## 🔗 Quick Links - [SimPO: Simple Preference Optimization with a Reference-Free Reward](#simple-preference-optimization-simpo) + - [Changelog](#-changelog) - [Tips for Running SimPO](#tips-for-running-simpo) - [Released Models](#released-models) - [Install Requirements](#install-requirements)