mirror of
https://github.com/wassname/SimPO.git
synced 2026-06-27 20:19:50 +08:00
Update README.md
This commit is contained in:
@@ -42,7 +42,7 @@ For DPO, we use the following hyperparameters for training.
|
||||
| Setting | β | Learning Rate |
|
||||
|------------------------|------|---------------|
|
||||
| Mistral-Base | 0.01 | 5e-7 |
|
||||
| Mistral-Instruct | 0.01 | 2e-7 |
|
||||
| Mistral-Instruct | 0.01 | 5e-7 |
|
||||
| Llama3-Base | 0.01 | 5e-7 |
|
||||
| Llama3-Instruct | 0.01 | 7e-7 |
|
||||
| Llama3-Instruct v0.2 | 0.01 | 3e-7 |
|
||||
|
||||
Reference in New Issue
Block a user