Files
Open-Assistant/model/supervised_finetuning/custom_datasets/README.md
T
theblackcat102 22e3ab1a89 [fix] linter fix
2023-01-20 07:23:02 +00:00

28 lines
568 B
Markdown

# Dataset collections overview:
currently dataset can be divided into 3 classes
- language knowledge
- summarization
- translation
- dialogue : don't let user know you are a robot
- STEM : knowledge about the world
- coding
- world knowledge <= ideally we want to handle this via prefix context
Issues and TODO:
- as dataset are growing, how can we update this section less
- ideally we can update the config yaml and new dataset will be download from
hub
- one possible idea is we upload the trasform format of these dataset to the
OA hub