Files
Kian-Meng Ang 1e321a6fca Fix typos (#1143)
Found via `codespell -S .mypy_cache,yarn.lock,*.json,*.ipynb -L
rouge,nam,vie`
2023-02-05 20:18:03 +01:00

569 B

Dataset collections overview:

currently dataset can be divided into 3 classes

  • language knowledge

    • summarization

    • translation

  • dialogue : don't let user know you are a robot

  • STEM : knowledge about the world

    • coding

    • world knowledge <= ideally we want to handle this via prefix context

Issues and TODO:

  • as dataset are growing, how can we update this section less

  • ideally we can update the config yaml and new dataset will be download from hub

    • one possible idea is we upload the transform format of these dataset to the OA hub