wassname 421c3a8ee0 tidy
2024-01-03 13:25:38 +08:00
2024-01-03 13:25:38 +08:00
2024-01-03 13:14:24 +08:00
wip
2024-01-02 13:46:21 +08:00
2024-01-03 11:02:09 +08:00
2024-01-03 08:10:05 +08:00
2024-01-03 13:25:38 +08:00
2024-01-03 08:10:05 +08:00
wip
2024-01-02 13:46:21 +08:00

Experiment using LLM's to detect BS writing

An experiment to see if I can detect "BS" using LLM's in a robust way.

It's impossible to detect AI generated text with 100% accuracy. But we will have more success detecting bad, empty, or contentless text.

Why might this work? As with Schmidhuber's definition of curiosity good writing should initially suprise the reader but become less suprising as they learn about it. Empty writing is not suprising. And confusing writing stays confusing even after you have read it. In other words it should have a high perplexity, which goes down after learning it.

The ideal way to do this would be by fine tuning. But that would be momory intensive so I will try and do it with prompts. I ask:

  • Is the text suprising (high perplexity)
  • Is less suprising when given a summary? (low perplexity)

If yes/yes then it's may be suprising new information. If either is not true, then it's proboably BS.

See main.ipynb for the code and results.

Results using adapter fine tuning

I fine tuned the model on the first half of a text, then tested on the second half. I measure how much it learned by the perplexity decrease. The rows with a high perplexity are unpredictable, and the ones with a higher improvement are learnable. Unpredictable and learnable text is not BS.

name before after in_training len improvement% improvement
wikipedia on LK-99 32.219 28.8525 False 1038 0.104489 3.36652
Theory o. general relativity 26.952 24.5425 True 1378 0.0894 2.40951
good_ml 28.3473 26.4566 False 1004 0.0666997 1.89076
enron_email1 25.7697 24.3904 True 445 0.0535253 1.37933
openai_board_ann 15.904 15.1736 False 1191 0.0459214 0.730332
Schmidhuber 2023 Subjective Novelty, Surprise 29.615 28.4708 False 2654 0.0386353 1.14418
email_to_fauci 25.0893 24.3714 False 1559 0.0286154 0.717941
sokal hoax 15.9664 15.7148 True 2487 0.0157617 0.251658
AI gen fake paper 7.63283 7.57951 False 2031 0.00698672 0.0533285
lorem ipsum 1.60166 1.59538 True 445 0.00392053 0.00627935
bad_ml 13.9061 13.8623 False 2345 0.00314972 0.0438004
I have a dream 2.12726 2.12344 True 848 0.00179583 0.00382018

For example the wikipedia extract wikipedia on LK-99 is unpredictable (high before perplexity) and is learnable (high improvement in perplexity). That makes sense as it's a new topic. In contrast lorem ipsum has a low perplexity, meaning it's predictalbe or memorizable. That makes sense as this text was likely in the training corpus. The AI gen fake paper has a low perplexity because it's predictable, even thought it is new.

See more in 01_detection_using_adapter_ft.ipynb

Results using prompting

When using microsoft/phi-2 we get this amount of perplexity reduction by including a summary of the key learnings

sample learning%
3 einsteins theory of general relativity 0.0751468
5 wikipedia on LK-99 0.0674738
8 Schmidhuber 2023 Subjective Novelty, Surprise 0.0396319
1 good_ml 0.0321225
0 bad_ml -9.58801e-05
2 sokal hoax -0.0168107
7 AI gen fake paper -0.134864
4 lorem ipsum -0.69694
6 I have a dream -0.796421

As you can see, some of these are probobly in the training set

See more in 02_detection_using_tldr_prompt.ipynb

Citing

If you like our work and end up using this code for your reseach give us a shout-out by citing or acknowledging

@misc{wassname2024,
  author = {Clark, M.J.},
  title = {BS Writing Detector},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/wassname/detect_bs_text}},
  commit = {}
}
S
Description
Can we measure how good a text is by how much an LLM learns from it?
Readme 83 MiB
Languages
Jupyter Notebook 100%