mirror of
https://github.com/wassname/Open-Assistant.git
synced 2026-06-27 16:10:30 +08:00
Dataset collections overview:
currently dataset can be divided into 3 classes
-
language knowledge
-
summarization
-
translation
-
-
dialogue : don't let user know you are a robot
-
STEM : knowledge about the world
-
coding
-
world knowledge <= ideally we want to handle this via prefix context
-
Issues and TODO:
-
as dataset are growing, how can we update this section less
-
ideally we can update the config yaml and new dataset will be download from hub
- one possible idea is we upload the transform format of these dataset to the OA hub