Experiment using LLM's to detect BS writing

An experiment to see if I can detect "BS" using LLM's in a robust way.

It's impossible to detect AI generated text with 100% accuracy. But we will have more success detecting bad, empty, or contentless text.

Why might this work? As with Schmidhuber's definition of curiosity good writing should initially suprise the reader but become less suprising as they learn about it. Empty writing is not suprising. And confusing writing stays confusing even after you have read it. In other words it should have a high perplexity, which goes down after learning it.

The ideal way to do this would be by fine tuning. But that would be momory intensive so I will try and do it with prompts. I ask:

Is the text suprising (high perplexity)
Is less suprising when given a summary? (low perplexity)

If yes/yes then it's may be suprising new information. If either is not true, then it's proboably BS.

See main.ipynb for the code and results.

Results using adapter fine tuning

I fine tuned the model on the first half of a text, then tested on the second half. I measure how much it learned by the perplexity decrease. The rows with a high perplexity are unpredictable, and the ones with a higher improvement are learnable. Unpredictable and learnable text is not BS.

name	before	after	in_training	len	improvement%	improvement
wikipedia on LK-99	32.219	28.8525	False	1038	0.104489	3.36652
Theory o. general relativity	26.952	24.5425	True	1378	0.0894	2.40951
good_ml	28.3473	26.4566	False	1004	0.0666997	1.89076
enron_email1	25.7697	24.3904	True	445	0.0535253	1.37933
openai_board_ann	15.904	15.1736	False	1191	0.0459214	0.730332
Schmidhuber 2023 Subjective Novelty, Surprise	29.615	28.4708	False	2654	0.0386353	1.14418
email_to_fauci	25.0893	24.3714	False	1559	0.0286154	0.717941
sokal hoax	15.9664	15.7148	True	2487	0.0157617	0.251658
AI gen fake paper	7.63283	7.57951	False	2031	0.00698672	0.0533285
lorem ipsum	1.60166	1.59538	True	445	0.00392053	0.00627935
bad_ml	13.9061	13.8623	False	2345	0.00314972	0.0438004
I have a dream	2.12726	2.12344	True	848	0.00179583	0.00382018

For example the wikipedia extract wikipedia on LK-99 is unpredictable (high before perplexity) and is learnable (high improvement in perplexity). That makes sense as it's a new topic. In contrast lorem ipsum has a low perplexity, meaning it's predictalbe or memorizable. That makes sense as this text was likely in the training corpus. The AI gen fake paper has a low perplexity because it's predictable, even thought it is new.

See more in 01_detection_using_adapter_ft.ipynb

Results using prompting

When using microsoft/phi-2 we get this amount of perplexity reduction by including a summary of the key learnings

	sample	learning%
3	einsteins theory of general relativity	0.0751468
5	wikipedia on LK-99	0.0674738
8	Schmidhuber 2023 Subjective Novelty, Surprise	0.0396319
1	good_ml	0.0321225
0	bad_ml	-9.58801e-05
2	sokal hoax	-0.0168107
7	AI gen fake paper	-0.134864
4	lorem ipsum	-0.69694
6	I have a dream	-0.796421

As you can see, some of these are probobly in the training set

See more in 02_detection_using_tldr_prompt.ipynb

Citing

If you like our work and end up using this code for your reseach give us a shout-out by citing or acknowledging

@misc{wassname2024,
  author = {Clark, M.J.},
  title = {BS Writing Detector},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/wassname/detect_bs_text}},
  commit = {}
}