mirror of
https://github.com/wassname/Open-Assistant.git
synced 2026-07-05 17:30:48 +08:00
28 lines
568 B
Markdown
28 lines
568 B
Markdown
# Dataset collections overview:
|
|
|
|
currently dataset can be divided into 3 classes
|
|
|
|
- language knowledge
|
|
|
|
- summarization
|
|
|
|
- translation
|
|
|
|
- dialogue : don't let user know you are a robot
|
|
|
|
- STEM : knowledge about the world
|
|
|
|
- coding
|
|
|
|
- world knowledge <= ideally we want to handle this via prefix context
|
|
|
|
Issues and TODO:
|
|
|
|
- as dataset are growing, how can we update this section less
|
|
|
|
- ideally we can update the config yaml and new dataset will be download from
|
|
hub
|
|
|
|
- one possible idea is we upload the trasform format of these dataset to the
|
|
OA hub
|