wassname/Open-Assistant

mirror of https://github.com/wassname/Open-Assistant.git synced 2026-06-27 16:10:30 +08:00

Files

T

History

theblackcat102 736f46fb00 [fix] prosocial dialogue format error

2023-01-22 14:00:20 +00:00

..

__init__.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

dialogue_collator.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

formatting.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

prompt_dialogue.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

qa_datasets.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

README.md

[fix] linter fix

2023-01-20 07:23:02 +00:00

summarization.py

[feature] move data formatting into dataset, instead of collator

2023-01-21 03:31:35 +00:00

toxic_conversation.py

[fix] prosocial dialogue format error

2023-01-22 14:00:20 +00:00

translation.py

[feature] add pythia and limit translation pair

2023-01-22 00:56:17 +00:00

README.md

Dataset collections overview:

currently dataset can be divided into 3 classes

language knowledge
- summarization
- translation
dialogue : don't let user know you are a robot
STEM : knowledge about the world
- coding
- world knowledge <= ideally we want to handle this via prefix context

Issues and TODO:

as dataset are growing, how can we update this section less
ideally we can update the config yaml and new dataset will be download from hub
- one possible idea is we upload the trasform format of these dataset to the OA hub