From 1750cecb54cb8691e26d6b26a9861e9236ddd7a7 Mon Sep 17 00:00:00 2001 From: markcheeky <10684818+markcheeky@users.noreply.github.com> Date: Sun, 1 Jan 2023 13:18:15 +0100 Subject: [PATCH 1/3] Create docs/supervised_datasets.md, suggested by yk in issue 186 --- docs/supervised_datasets.md | 59 +++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) create mode 100644 docs/supervised_datasets.md diff --git a/docs/supervised_datasets.md b/docs/supervised_datasets.md new file mode 100644 index 00000000..38d4ba2c --- /dev/null +++ b/docs/supervised_datasets.md @@ -0,0 +1,59 @@ +# Supervised datasets + +For discussion about usage of supervised data see issue . + + +## Motivation + +An important part of making the assistant useful is to teach it to understand and follow instructions, and to perform large set of tasks well. + +While RLHF seems like the main ingredient, using existing supervised data might help. + +There are two large-scale projects in the area of instruction-following / multitask learning: Promptsource and Natural Instructions - +these projects crowdsourced templates and turned existing NLP datasets into instruction-following seq2seq form in natural langauge. +They include both long-output training examples like generating a sentence that is a likely consequence of sentence in the prompt, and +short-output, like rating prediction from review. (Pre-)training on such datasets should help model understand and follow instructions +and teach it many abilities neccessary to perform a large set of tasks correctly. However, these data are not dialog-like - they do not +look like a normal conversation. + +There are also supervised dialog datasets such as Blended Skill Talk or SODA. In constrast to instruction-following datasets, dialog data +is not as focused on "academic tasks" or correctness, but encourage the model to respond naturally like a person would. + +### Promptsource +- GitHub: +- paper: [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) +- project for preparing templates and working with them +- they generated a dataset using the templates: + - + - (with multilingual data but English prompt) + - (with multilingual data and machine-translated prompt) +- they trained zero-shot models (= models for following instructions in the input) + - based on T5 architecture (encoder-decoder) called T0 family (and MT0 for multilingual) + - and based on GPT architecture (decoder-only) called BloomZ family + - Huggingface demo: [T0](https://huggingface.co/bigscience/T0pp), [MT0](https://huggingface.co/bigscience/mt0-large), [BloomZ](https://huggingface.co/bigscience/bloomz), + - GitHub repo for T0: + - GitHub repo for BloomZ and MT0: + + +### Natural instructions +- GitHub: +- paper: [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705) +- they crowdsource directly the data prepared for instruction following (and learning from a few examples) +- the GitHub repo = the dataset. It contains jsons +- they trained zero-shot and in-context few-shot models (in multiple sizes): + - mT5 architecture (encoder-decoder, multilingual pretraining) + - Huggingface demo few-shot: + - Huggingface demo zero-shot: + + +### Blended Skill Talk +- used by Facebook in Blenderbot project +- HuggingFace dataset: +- example model trained on it: + + +### SODA +- GitHub: +- paper: + + From 35cc68d7f7cf2dc471488076f5aabe91f84acf22 Mon Sep 17 00:00:00 2001 From: Marek Kadlcik <10684818+markcheeky@users.noreply.github.com> Date: Tue, 3 Jan 2023 21:22:54 +0100 Subject: [PATCH 2/3] fix formatting to make linter happy --- docs/supervised_datasets.md | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/supervised_datasets.md b/docs/supervised_datasets.md index 38d4ba2c..c23e06f1 100644 --- a/docs/supervised_datasets.md +++ b/docs/supervised_datasets.md @@ -2,7 +2,6 @@ For discussion about usage of supervised data see issue . - ## Motivation An important part of making the assistant useful is to teach it to understand and follow instructions, and to perform large set of tasks well. @@ -20,6 +19,7 @@ There are also supervised dialog datasets such as Blended Skill Talk or SODA. In is not as focused on "academic tasks" or correctness, but encourage the model to respond naturally like a person would. ### Promptsource + - GitHub: - paper: [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) - project for preparing templates and working with them @@ -29,31 +29,29 @@ is not as focused on "academic tasks" or correctness, but encourage the model to - (with multilingual data and machine-translated prompt) - they trained zero-shot models (= models for following instructions in the input) - based on T5 architecture (encoder-decoder) called T0 family (and MT0 for multilingual) - - and based on GPT architecture (decoder-only) called BloomZ family - - Huggingface demo: [T0](https://huggingface.co/bigscience/T0pp), [MT0](https://huggingface.co/bigscience/mt0-large), [BloomZ](https://huggingface.co/bigscience/bloomz), + - and based on GPT architecture (decoder-only) called BloomZ family + - Huggingface demo: [T0](https://huggingface.co/bigscience/T0pp), [MT0](https://huggingface.co/bigscience/mt0-large), [BloomZ](https://huggingface.co/bigscience/bloomz), - GitHub repo for T0: - GitHub repo for BloomZ and MT0: - ### Natural instructions + - GitHub: - paper: [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705) - they crowdsource directly the data prepared for instruction following (and learning from a few examples) -- the GitHub repo = the dataset. It contains jsons +- the GitHub repo = the dataset. It contains jsons - they trained zero-shot and in-context few-shot models (in multiple sizes): - mT5 architecture (encoder-decoder, multilingual pretraining) - Huggingface demo few-shot: - Huggingface demo zero-shot: - ### Blended Skill Talk + - used by Facebook in Blenderbot project - HuggingFace dataset: - example model trained on it: - ### SODA + - GitHub: - paper: - - From c7ff4c6a9731c101f921f1ebc8c6558f8bdd7704 Mon Sep 17 00:00:00 2001 From: Marek Kadlcik <10684818+markcheeky@users.noreply.github.com> Date: Tue, 3 Jan 2023 21:25:54 +0100 Subject: [PATCH 3/3] fix formatting on docs/supervised_data again to make linter happy again because the formatting rules changed in between --- docs/supervised_datasets.md | 66 ++++++++++++++++++++++++------------- 1 file changed, 44 insertions(+), 22 deletions(-) diff --git a/docs/supervised_datasets.md b/docs/supervised_datasets.md index c23e06f1..0f8c986d 100644 --- a/docs/supervised_datasets.md +++ b/docs/supervised_datasets.md @@ -1,55 +1,77 @@ # Supervised datasets -For discussion about usage of supervised data see issue . +For discussion about usage of supervised data see issue +. ## Motivation -An important part of making the assistant useful is to teach it to understand and follow instructions, and to perform large set of tasks well. +An important part of making the assistant useful is to teach it to understand +and follow instructions, and to perform large set of tasks well. -While RLHF seems like the main ingredient, using existing supervised data might help. +While RLHF seems like the main ingredient, using existing supervised data might +help. -There are two large-scale projects in the area of instruction-following / multitask learning: Promptsource and Natural Instructions - -these projects crowdsourced templates and turned existing NLP datasets into instruction-following seq2seq form in natural langauge. -They include both long-output training examples like generating a sentence that is a likely consequence of sentence in the prompt, and -short-output, like rating prediction from review. (Pre-)training on such datasets should help model understand and follow instructions -and teach it many abilities neccessary to perform a large set of tasks correctly. However, these data are not dialog-like - they do not +There are two large-scale projects in the area of instruction-following / +multitask learning: Promptsource and Natural Instructions - these projects +crowdsourced templates and turned existing NLP datasets into +instruction-following seq2seq form in natural langauge. They include both +long-output training examples like generating a sentence that is a likely +consequence of sentence in the prompt, and short-output, like rating prediction +from review. (Pre-)training on such datasets should help model understand and +follow instructions and teach it many abilities neccessary to perform a large +set of tasks correctly. However, these data are not dialog-like - they do not look like a normal conversation. -There are also supervised dialog datasets such as Blended Skill Talk or SODA. In constrast to instruction-following datasets, dialog data -is not as focused on "academic tasks" or correctness, but encourage the model to respond naturally like a person would. +There are also supervised dialog datasets such as Blended Skill Talk or SODA. In +constrast to instruction-following datasets, dialog data is not as focused on +"academic tasks" or correctness, but encourage the model to respond naturally +like a person would. ### Promptsource - GitHub: -- paper: [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) +- paper: + [Multitask Prompted Training Enables Zero-Shot Task Generalization](https://arxiv.org/abs/2110.08207) - project for preparing templates and working with them - they generated a dataset using the templates: - - - (with multilingual data but English prompt) - - (with multilingual data and machine-translated prompt) -- they trained zero-shot models (= models for following instructions in the input) - - based on T5 architecture (encoder-decoder) called T0 family (and MT0 for multilingual) + - (with multilingual data but + English prompt) + - (with multilingual data + and machine-translated prompt) +- they trained zero-shot models (= models for following instructions in the + input) + - based on T5 architecture (encoder-decoder) called T0 family (and MT0 for + multilingual) - and based on GPT architecture (decoder-only) called BloomZ family - - Huggingface demo: [T0](https://huggingface.co/bigscience/T0pp), [MT0](https://huggingface.co/bigscience/mt0-large), [BloomZ](https://huggingface.co/bigscience/bloomz), + - Huggingface demo: [T0](https://huggingface.co/bigscience/T0pp), + [MT0](https://huggingface.co/bigscience/mt0-large), + [BloomZ](https://huggingface.co/bigscience/bloomz), - GitHub repo for T0: - - GitHub repo for BloomZ and MT0: + - GitHub repo for BloomZ and MT0: + ### Natural instructions - GitHub: -- paper: [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705) -- they crowdsource directly the data prepared for instruction following (and learning from a few examples) +- paper: + [Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks](https://arxiv.org/abs/2204.07705) +- they crowdsource directly the data prepared for instruction following (and + learning from a few examples) - the GitHub repo = the dataset. It contains jsons - they trained zero-shot and in-context few-shot models (in multiple sizes): - mT5 architecture (encoder-decoder, multilingual pretraining) - - Huggingface demo few-shot: - - Huggingface demo zero-shot: + - Huggingface demo few-shot: + + - Huggingface demo zero-shot: + ### Blended Skill Talk - used by Facebook in Blenderbot project - HuggingFace dataset: -- example model trained on it: +- example model trained on it: + ### SODA