Open-Assistant/docs/data_augmentation.md

# Data Augmentation

(pull request welcome)

## What is data augmentation

Data augmentation is a technique we can use to get better data faster. Using
machine learning models to analyze long data (like an essay) and compress it
into instructions.

## How to contribute

To contribute to data augmentation you can write a short Python script that uses
a model from HuggingFace to analyze the text.
[Here](https://docs.google.com/document/d/13a188pPvqnlvuVa3e_suVz4YO5s-JWeiOOrpp0odImg/edit)
are examples of what you can do.

And here are example implementations:
[Idea 3](https://colab.research.google.com/drive/1GllCN5PgSYxBxINZsv3A2r0SpdznHlbT?usp=sharing),
[Idea 4](https://colab.research.google.com/drive/1nZx5LRjO61fYprFyqtrwPDLOis6ctR4p#scrollTo=1EE8CriiaCXj)

To contribute simply choose one of many ideas from the document above and
implement it.