wassname/Open-Assistant

mirror of https://github.com/wassname/Open-Assistant.git synced 2026-06-27 16:10:30 +08:00

Files

T

History

theblackcat102 bcebbbc49c [merge] Fix conflict

2023-02-11 00:23:25 +00:00

..

__init__.py

[fix] patch translated history conversation

2023-02-08 00:20:11 +00:00

dialogue_collator.py

[fix] Custom collate_fn for training

2023-02-03 06:08:01 +00:00

formatting.py

[fix] remove spaces in format_pair

2023-01-23 02:48:47 +00:00

prompt_dialogue.py

[feature] Add rallio new instruction dataset v3

2023-02-06 00:22:30 +00:00

qa_datasets.py

[merge] Fix conflict

2023-02-11 00:23:25 +00:00

README.md

Fix typos (#1143 )

2023-02-05 20:18:03 +01:00

summarization.py

[feature] Add mix conversation augmentation

2023-02-01 22:14:11 +00:00

toxic_conversation.py

[fix] prosocial dialogue format error

2023-01-22 14:00:20 +00:00

translation.py

Get working on multi-gpu

2023-02-08 20:49:25 +00:00

README.md

Dataset collections overview:

currently dataset can be divided into 3 classes

language knowledge
- summarization
- translation
dialogue : don't let user know you are a robot
STEM : knowledge about the world
- coding
- world knowledge <= ideally we want to handle this via prefix context

Issues and TODO:

as dataset are growing, how can we update this section less
ideally we can update the config yaml and new dataset will be download from hub
- one possible idea is we upload the transform format of these dataset to the OA hub