From 2d5cbf1dad44eb6b5c3536de48e9563ef21e6eb9 Mon Sep 17 00:00:00 2001 From: Yu Meng Date: Tue, 9 Jul 2024 14:24:12 -0400 Subject: [PATCH] Update README.md --- README.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index 4b747f5..bbcf787 100644 --- a/README.md +++ b/README.md @@ -29,15 +29,16 @@ We used the following hyperparameters for training the released models (note tha | Mistral-Instruct | 2.5 | 0.1 | 5e-7 | | Llama3-Base | 2.0 | 0.5 | 6e-7 | | Llama3-Instruct | 2.5 | 0.55 | 1e-6 | +| Llama3-Instruct v0.2 | 10 | 0.3 | 1e-6 | For DPO, we use the following hyperparameters for training. | Setting | β | Learning Rate | |------------------------|------|---------------| -| mistral-base | 0.01 | 5e-7 | -| mistral-instruct | 0.01 | 2e-7 | -| llama3-base | 0.01 | 5e-7 | -| llama3-instruct | 0.01 | 7e-7 | -| llama3-instruct v0.2 | 0.01 | 3e-7 | +| Mistral-Base | 0.01 | 5e-7 | +| Mistral-Instruct | 0.01 | 2e-7 | +| Llama3-Base | 0.01 | 5e-7 | +| Llama3-Instruct | 0.01 | 7e-7 | +| Llama3-Instruct v0.2 | 0.01 | 3e-7 | ### Training and evaluation consistency in BOS