diff --git a/notebooks/detoxify-evaluation/README.md b/notebooks/detoxify-evaluation/README.md index 437918d0..c56c2600 100644 --- a/notebooks/detoxify-evaluation/README.md +++ b/notebooks/detoxify-evaluation/README.md @@ -1,33 +1,25 @@ - # Detoxify evaluation [Detoxify](https://github.com/unitaryai/detoxify) is a open source model used to identify prompts as toxic - - Image from detoxify github that shows the example input/output of their model - - - It contains 3 different models that vary in transformer type and data it was trained on - - - +| Model name | Transformer type | Data from | +| :----------: | :---------------: | :----------------------------------------: | +| original | bert-base-uncased | Toxic Comment Classification Challenge | +| unbiased | roberta-base | Unintended Bias in Toxicity Classification | +| multilingual | xlm-roberta-base | Multilingual Toxic Comment Classification | -| Model name | Transformer type | Data from | -| :---: | :---: | :---: | -|original| bert-base-uncased| Toxic Comment Classification Challenge| -|unbiased| roberta-base| Unintended Bias in Toxicity Classification| -|multilingual| xlm-roberta-base| Multilingual Toxic Comment Classification| Unbiased and original models also have a 'small' version - but since normal models are not memory heavy, and small models perform noticably worse, they are only described in the notebook ## All tests below were ran on a 3090TI # Inference and training times and memory usages + Charts showing detailed memory usages and times for different sentence lengths and batch sizes are inside the notebook -Quick overview batch size 16, sentence length 4k for training, batch size 128 sentence length 4k for inference +Quick overview batch size 16, sentence length 4k for training, batch size 128 sentence length 4k for inference | Model name | Training memory| Training speed | Inference Memory| Inference Speed| | :---: | :---: | :---: |:---: | :---: | |original| 11.8GB | 2.40s| 4.8GB|16.48s| @@ -35,25 +27,27 @@ Quick overview batch size 16, sentence length 4k for training, batch size 128 se |multilingual|14GB| 1.00s| 5.5GB| 4.89s| # Filtering quality + Detoxify was tested on 4 different types of inputs + - Not obviously toxic - Not obviously non-toxic - Obviously toxic - Obviously non-toxic ### Sentences used for testing and rating are contained inside the .ipynb -| Model name | Not obviously toxic| Not obviously non-toxic | Obviously toxic| Obviously non-toxic| -| :---: | :---: | :---: |:---: | :---: | -|original| failed at all, easily accepted racist, sexist overally toxic prompts that were well formulated |Very sensitive on swear words, failed to reckognize context| good performance|good performance| -|unbiased|Managed to find some hidden toxicity but not on all sentences| Very sensitive explicit language but shown ability to recognize context| Did well but failed to reckognize some gender stereotype mockery | good performance -|multilingual|Managed to find some hidden toxicity but not on all sentences| Very sensitive explicit language but shown ability to recognize context| Did well but failed to reckognize some gender stereotype mockery | good performance -Subjectivly 'unbiased' looks like the best performing model. +| Model name | Not obviously toxic | Not obviously non-toxic | Obviously toxic | Obviously non-toxic | +| :----------: | :--------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :--------------------------------------------------------------: | :-----------------: | +| original | failed at all, easily accepted racist, sexist overally toxic prompts that were well formulated | Very sensitive on swear words, failed to reckognize context | good performance | good performance | +| unbiased | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to reckognize some gender stereotype mockery | good performance | +| multilingual | Managed to find some hidden toxicity but not on all sentences | Very sensitive explicit language but shown ability to recognize context | Did well but failed to reckognize some gender stereotype mockery | good performance | -I don't think it would do well as a security layer in a live version of open assistant unless we do some finetuning first, because it can be fooled to pass toxicity if it's presented in formal language. +Subjectivly 'unbiased' looks like the best performing model. -With some caution it can be used to filter prompts but I would suggest also using someone for verification of messages that are marked as toxic but still below 90% confidence +I don't think it would do well as a security layer in a live version of open assistant unless we do some finetuning first, because it can be fooled to pass toxicity if it's presented in formal language. +With some caution it can be used to filter prompts but I would suggest also using someone for verification of messages that are marked as toxic but still below 90% confidence # Licensing @@ -71,16 +65,12 @@ With some caution it can be used to filter prompts but I would suggest also usin - Private use - - #### You cannot - Hold the owner liable - Use the owner's trademark - - #### You must - Include Copyright @@ -91,21 +81,20 @@ With some caution it can be used to filter prompts but I would suggest also usin - Include notice - - This is obviously not legal advice. - # Hosting -The model is currently available on [huggingface](https://huggingface.co/unitary) and torch hub +The model is currently available on [huggingface](https://huggingface.co/unitary) and torch hub + ``` torch.hub.load('unitaryai/detoxify',model) ``` - where model is one of: + +where model is one of: - toxic_bert - unbiased_toxic_roberta -- multilingual_toxic_xlm_r \ No newline at end of file +- multilingual_toxic_xlm_r