Files
Open-Assistant/model/supervised_finetuning/custom_datasets
2023-02-11 21:37:35 +01:00
..
2023-02-11 13:20:42 +01:00
2023-02-11 10:29:56 +01:00
2023-02-11 13:20:42 +01:00
2023-02-11 01:53:37 +00:00
2023-02-05 20:18:03 +01:00
2023-02-08 20:49:25 +00:00

Dataset collections overview:

currently dataset can be divided into 3 classes

  • language knowledge

    • summarization

    • translation

  • dialogue : don't let user know you are a robot

  • STEM : knowledge about the world

    • coding

    • world knowledge <= ideally we want to handle this via prefix context

Issues and TODO:

  • as dataset are growing, how can we update this section less

  • ideally we can update the config yaml and new dataset will be download from hub

    • one possible idea is we upload the transform format of these dataset to the OA hub