detect_bs_text/nbs/01_use_tldr_prompt.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "An attempt to measure suprise in text using adapters\n",
    "\n",
    "https://github.com/huggingface/peft/blob/main/examples/fp4_finetuning/finetune_fp4_opt_bnb_peft.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/media/wassname/SGIronWolf/projects5/bs_writing_detector/.venv/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
    "import torch\n",
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import transformers\n",
    "from datasets import load_dataset\n",
    "from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig\n",
    "\n",
    "# from peft import LoraConfig, get_peft_model\n",
    "\n",
    "\n",
    "# os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'bad_ml',\n",
       " 'url': 'https://arxiv.org/abs/2312.10868',\n",
       " 'text': 'This roadmap survey has embarked on an exploration of the\\ntransformative trends in generative AI research, particularly focusing on speculated advancements like Q* and the progressive strides towards AGI. Our analysis highlights a crucial paradigm shift, driven by innovations such as MoE, multi-modal learning, and the pursuit of AGI. These advancements signal a future where AI systems could significantly extend their capabilities in reasoning, contextual understanding, and creative problem-solving. This study reflects on AI’s dual potential to either contribute to or impede global equity and justice. The equitable distribution of AI benefits and its role in decision-making processes raise crucial questions about fairness and inclusivity. It is imperative to thoughtfully integrate AI into societal structures to enhance justice and reduce disparities. Despite these advancements, several open questions and research gaps remain. These include ensuring the ethical alignment of advanced AI systems with human values and societal norms, a challenge compounded by their increasing autonomy. The safety and robustness of AGI systems in diverse environments also remain a significant research gap. Addressing these challenges requires a multidisciplinary approach, incorporating ethical, social, and philosophical perspectives. Our survey has highlighted key areas for future inter-disciplinary research in AI, emphasizing the integration of ethical, sociological, and technical perspectives. This approach will foster collaborative research, bridging the gap between technological advancement and societal needs, ensuring that AI development is aligned with human values and global welfare. The roles of MoE, multimodal, and AGI in reshaping generative AI have been identified as significant, as their advancements can enhance model performance and versatility, and pave the way for future research in areas like ethical AI alignment and AGI. As we forge ahead, the balance between AI advancements and human creativity is not just a goal but a necessity, ensuring AI’s role as a complementary force that amplifies our capacity to innovate and solve complex challenges. Our responsibility is to guide these advancements towards enriching the human experience, aligning technological progress with ethical standards and societal well-being. ',\n",
       " 'in_training': False}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "MAX_LEN = 2000\n",
    "import json\n",
    "samples = json.load(open(\"../samples.json\"))\n",
    "df_samples = pd.DataFrame(samples)\n",
    "df_samples['len'] = df_samples['text'].str.len()\n",
    "df_samples\n",
    "\n",
    "\n",
    "sample = samples[0]\n",
    "sample"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Helpers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from dotenv import load_dotenv\n",
    "\n",
    "load_dotenv() "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Exploring representation engineering (RepE) for AI transparency, we proposed new methods that improved results on TruthfulQA and presented a tool for AI safety issues. Future RepE research could further boost our control over AI systems.'"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import os\n",
    "from openai import OpenAI\n",
    "\n",
    "def summize(text):\n",
    "    client = OpenAI()\n",
    "    content = f\"Make a tl;dr of this text in <280 chars.\\n\\n## Text\\n\\n{text}\\n\\n## Instruction\\n\\nMake a tl;dr of this text in <280 chars. Start with the most important, as extra text will be discarded :\\n\\ntl;dr:\"\n",
    "    chat_completion = client.chat.completions.create(\n",
    "            messages=[\n",
    "                {\n",
    "                    \"role\": \"user\",\n",
    "                    \"content\": content,\n",
    "                }\n",
    "            ],\n",
    "            model=\"gpt-4\",\n",
    "        )\n",
    "    # print(content)\n",
    "    r = chat_completion.choices[0].message.content\n",
    "    return r\n",
    "\n",
    "r = summize(samples[1][\"text\"])\n",
    "r"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modified from https://github.dev/huggingface/evaluate/blob/8dfe05784099fb9af55b8e77793205a3b7c86465/measurements/perplexity/perplexity.py#L154\n",
    "\n",
    "# from evaluate.measurements.perplexity import Perplexity\n",
    "import evaluate\n",
    "from evaluate import logging\n",
    "from torch.nn import CrossEntropyLoss\n",
    "\n",
    "# @evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)\n",
    "def perplexity_compute(\n",
    "    data, model, tokenizer, batch_size: int = 16, add_start_token: bool = True, device=None, max_length=None\n",
    "):\n",
    "\n",
    "    if device is not None:\n",
    "        assert device in [\"gpu\", \"cpu\", \"cuda\"], \"device should be either gpu or cpu.\"\n",
    "        if device == \"gpu\":\n",
    "            device = \"cuda\"\n",
    "    else:\n",
    "        device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    "\n",
    "    # model = AutoModelForCausalLM.from_pretrained(model_id)\n",
    "    model = model.to(device)\n",
    "\n",
    "    # tokenizer = AutoTokenizer.from_pretrained(model_id)\n",
    "\n",
    "    # if batch_size > 1 (which generally leads to padding being required), and\n",
    "    # if there is not an already assigned pad_token, assign an existing\n",
    "    # special token to also be the padding token\n",
    "    if tokenizer.pad_token is None and batch_size > 1:\n",
    "        existing_special_tokens = list(tokenizer.special_tokens_map_extended.values())\n",
    "        # check that the model already has at least one special token defined\n",
    "        assert (\n",
    "            len(existing_special_tokens) > 0\n",
    "        ), \"If batch_size > 1, model must have at least one special token to use for padding. Please use a different model or set batch_size=1.\"\n",
    "        # assign one of the special tokens to also be the pad token\n",
    "        tokenizer.add_special_tokens({\"pad_token\": existing_special_tokens[0]})\n",
    "\n",
    "    if add_start_token and max_length:\n",
    "        # leave room for <BOS> token to be added:\n",
    "        assert (\n",
    "            tokenizer.bos_token is not None\n",
    "        ), \"Input model must already have a BOS token if using add_start_token=True. Please use a different model, or set add_start_token=False\"\n",
    "        max_tokenized_len = max_length - 1\n",
    "    else:\n",
    "        max_tokenized_len = max_length\n",
    "\n",
    "    encodings = tokenizer(\n",
    "        data,\n",
    "        add_special_tokens=False,\n",
    "        padding=True,\n",
    "        truncation=True if max_tokenized_len else False,\n",
    "        max_length=max_tokenized_len,\n",
    "        return_tensors=\"pt\",\n",
    "        return_attention_mask=True,\n",
    "    ).to(device)\n",
    "\n",
    "    encoded_texts = encodings[\"input_ids\"]\n",
    "    attn_masks = encodings[\"attention_mask\"]\n",
    "\n",
    "    # check that each input is long enough:\n",
    "    if add_start_token:\n",
    "        assert torch.all(torch.ge(attn_masks.sum(1), 1)), \"Each input text must be at least one token long.\"\n",
    "    else:\n",
    "        assert torch.all(\n",
    "            torch.ge(attn_masks.sum(1), 2)\n",
    "        ), \"When add_start_token=False, each input text must be at least two tokens long. Run with add_start_token=True if inputting strings of only one token, and remove all empty input strings.\"\n",
    "\n",
    "    ppls = []\n",
    "    loss_fct = CrossEntropyLoss(reduction=\"none\")\n",
    "\n",
    "    for start_index in logging.tqdm(range(0, len(encoded_texts), batch_size)):\n",
    "        end_index = min(start_index + batch_size, len(encoded_texts))\n",
    "        encoded_batch = encoded_texts[start_index:end_index]\n",
    "        attn_mask = attn_masks[start_index:end_index]\n",
    "\n",
    "        if add_start_token:\n",
    "            bos_tokens_tensor = torch.tensor([[tokenizer.bos_token_id]] * encoded_batch.size(dim=0)).to(device)\n",
    "            encoded_batch = torch.cat([bos_tokens_tensor, encoded_batch], dim=1)\n",
    "            attn_mask = torch.cat(\n",
    "                [torch.ones(bos_tokens_tensor.size(), dtype=torch.int64).to(device), attn_mask], dim=1\n",
    "            )\n",
    "\n",
    "        labels = encoded_batch\n",
    "\n",
    "        with torch.no_grad():\n",
    "            out_logits = model(encoded_batch, attention_mask=attn_mask).logits\n",
    "\n",
    "        shift_logits = out_logits[..., :-1, :].contiguous()\n",
    "        shift_labels = labels[..., 1:].contiguous()\n",
    "        shift_attention_mask_batch = attn_mask[..., 1:].contiguous()\n",
    "\n",
    "        perplexity_batch = torch.exp(\n",
    "            (loss_fct(shift_logits.transpose(1, 2), shift_labels) * shift_attention_mask_batch).sum(1)\n",
    "            / shift_attention_mask_batch.sum(1)\n",
    "        )\n",
    "\n",
    "        ppls += perplexity_batch.tolist()\n",
    "\n",
    "    return {\"perplexities\": ppls, \"mean_perplexity\": np.mean(ppls)}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AutoModelForMaskedLM, AutoModelForCausalLM, AutoConfig, PreTrainedTokenizerBase, PreTrainedTokenizer, GPTQConfig, BitsAndBytesConfig\n",
    "\n",
    "def load_model(model_name):\n",
    "    trust_remote_code = True\n",
    "    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=trust_remote_code)\n",
    "    config = AutoConfig.from_pretrained(model_name, trust_remote_code=trust_remote_code)\n",
    "    # print(config)\n",
    "    if config.quantization_config is not None:\n",
    "        config.quantization_config['disable_exllama'] = True\n",
    "        if 'use_exllama' in config.quantization_config:\n",
    "            del config.quantization_config['use_exllama']\n",
    "    model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=trust_remote_code, \n",
    "                                                 config=config,\n",
    "                                                 )\n",
    "    return model, tokenizer\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gc\n",
    "\n",
    "def clear_mem():\n",
    "    torch.cuda.empty_cache()\n",
    "    gc.collect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "models = [\n",
    "    \"TheBloke/phi-2-GPTQ\",\n",
    "    \"TheBloke/Llama-2-7B-GPTQ\",\n",
    "    \"TheBloke/Llama-2-13B-GPTQ\",\n",
    "    \"TheBloke/Mistral-7B-v0.1-GPTQ\",\n",
    "]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
      "Using pad_token, but it is not set yet.\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.20it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.14it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ bad_ml 12.456705093383789 11.44649887084961\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  5.00it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.92it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ good_ml 22.6639461517334 20.115413665771484\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.17it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  3.81it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ sokal hoax 14.285429000854492 14.216052055358887\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.48it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.06it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ Theory o. general relativity 20.507640838623047 19.644332885742188\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  5.02it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  5.00it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ lorem ipsum  1.1642249822616577 2.3542158603668213\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.48it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.50it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ wikipedia on LK-99 18.05230140686035 14.71042537689209\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.83it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.60it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ I have a dream 2.8362326622009277 4.256137371063232\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.45it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.28it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ AI gen fake paper 7.09130334854126 7.495457649230957\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.54it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  3.71it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ Schmidhuber 2023 Subjective Novelty, Surprise 28.30998992919922 27.353872299194336\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.34it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  3.98it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ email_to_fauci 21.060319900512695 18.637592315673828\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  5.12it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  5.21it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ enron_email1 22.875591278076172 19.7093563079834\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  4.48it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  4.16it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/phi-2-GPTQ openai_board_ann 8.552927017211914 9.061805725097656\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using pad_token, but it is not set yet.\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.40it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.40it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ bad_ml 7.999119281768799 7.641711235046387\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.24it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.78it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ good_ml 14.435519218444824 13.1925630569458\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.44it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.40it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ sokal hoax 7.42789363861084 7.882516860961914\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.74it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.70it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ Theory o. general relativity 13.795381546020508 13.612862586975098\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.35it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  3.29it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ lorem ipsum  1.24347722530365 2.416914463043213\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.21it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.75it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ wikipedia on LK-99 12.341974258422852 9.84363842010498\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.28it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.75it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ I have a dream 2.2682995796203613 3.098977565765381\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.48it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.47it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ AI gen fake paper 6.16485071182251 6.050373554229736\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.37it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.38it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ Schmidhuber 2023 Subjective Novelty, Surprise 13.61117172241211 13.45300006866455\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.52it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.48it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ email_to_fauci 11.317075729370117 10.221866607666016\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  3.63it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  3.44it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ enron_email1 13.783302307128906 11.154099464416504\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 1/1 [00:00<00:00,  2.71it/s]\n",
      "100%|██████████| 1/1 [00:00<00:00,  2.73it/s]\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TheBloke/Llama-2-7B-GPTQ openai_board_ann 5.916965007781982 5.880436897277832\n"
     ]
    }
   ],
   "source": [
    "summaries = {}\n",
    "data = []\n",
    "for model_name in models:\n",
    "    model, tokenizer = load_model(model_name)\n",
    "    for sample in samples:\n",
    "        if sample['name'] not in summaries:\n",
    "            summaries[sample['name']] = summize(sample['text'])[:600]\n",
    "        summary = summaries[sample['name']]\n",
    "\n",
    "        # before \n",
    "        s1 = sample['text']\n",
    "        results = perplexity_compute(data=s1, model=model, tokenizer=tokenizer, device='cuda')\n",
    "        before = results['mean_perplexity']\n",
    "\n",
    "        # after \n",
    "        s2 = f\"\"\"\n",
    "        High level summary: {summary}\n",
    "\n",
    "Text:\n",
    "{sample['text']}\n",
    "        \"\"\"\n",
    "        results = perplexity_compute(data=s2, model=model, tokenizer=tokenizer, device='cuda')\n",
    "        after = np.array(results['perplexities'])[-len(s1):].mean()\n",
    "\n",
    "        print(model_name, sample['name'], before, after)\n",
    "        data.append(dict(before=before, after=after, model=model_name, sample=sample['name']))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# results\n",
    "df = pd.DataFrame(data)\n",
    "df[\"learning%\"] = (df[\"before\"] - df[\"after\"])/df[\"before\"]\n",
    "# df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "for n,d in df.groupby(\"model\"):\n",
    "    print(n)\n",
    "    d = d[['sample', 'learning%', 'in_training']].sort_values(\"learning%\", ascending=False)\n",
    "    print(d.to_markdown())\n",
    "    display(d)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: compare big and small"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.0rc1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}