From fa8cf79b83f2822652eebe01dc4a3393242e256f Mon Sep 17 00:00:00 2001 From: wassname Date: Sat, 22 Apr 2023 11:29:29 +0800 Subject: [PATCH] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6c79729..876409d 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ Data can generally be divided along two axis: - high quality 🗹 or Lower quality ☐ - natural 🧑 or unnatural 🤖 -Depending on your training objectives you will want lots of low quality instruction data, or a small amount of high quality data. Which should you use? Lets see what Anthropic have to say [Askell et all Antrhopic](https://arxiv.org/abs/2112.00861)]: +Depending on your training objectives you will want lots of low quality instruction data, or a small amount of high quality data. Which should you use? Lets see what Anthropic have to say in [Askell et al](https://arxiv.org/abs/2112.00861): > **How can we improve the sample efficiency of preference modeling?** We find that we can significantly improve sample efficiency using a ‘preference model pre-training’ (PMP) stage of training, where we first pre-train on large public datasets that encode human preference information, such as Stack Exchange, Reddit, and Wikipedia edits, before finetuning on smaller datasets encoding more specific human preferences.