mirror of
https://github.com/wassname/Open-Assistant.git
synced 2026-06-27 16:10:30 +08:00
Merge pull request #304 from janosh/pre-commit-jupyter-black
pre-commit hook black->black-jupyter
This commit is contained in:
@@ -26,10 +26,7 @@
|
||||
#
|
||||
# /WARNING!
|
||||
|
||||
exclude: "build|stubs|^bot/templates/|^notebooks/.*\\.ipynb$"
|
||||
|
||||
default_language_version:
|
||||
python: python3
|
||||
exclude: build|stubs|^bot/templates/$
|
||||
|
||||
repos:
|
||||
- repo: https://github.com/pre-commit/pre-commit-hooks
|
||||
@@ -42,12 +39,12 @@ repos:
|
||||
# and which break the standard YAML check. The alternative would be to
|
||||
# skip any unsafe errors (and thus break YAML compatibility) or use
|
||||
# some other checker that may not work in general.
|
||||
exclude: "^copilot/.*/addons/.*$"
|
||||
exclude: ^copilot/.*/addons/.*$
|
||||
- id: check-json
|
||||
- id: check-case-conflict
|
||||
- id: detect-private-key
|
||||
- id: fix-encoding-pragma
|
||||
args: ["--remove"]
|
||||
args: [--remove]
|
||||
- id: forbid-submodules
|
||||
- id: mixed-line-ending
|
||||
- id: requirements-txt-fixer
|
||||
@@ -57,13 +54,13 @@ repos:
|
||||
- id: check-symlinks
|
||||
- id: check-merge-conflict
|
||||
- id: check-added-large-files
|
||||
args: ["--maxkb=1024"]
|
||||
args: [--maxkb=1024]
|
||||
- id: end-of-file-fixer
|
||||
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 22.12.0
|
||||
hooks:
|
||||
- id: black
|
||||
- id: black-jupyter
|
||||
|
||||
- repo: https://github.com/pycqa/flake8
|
||||
rev: 6.0.0
|
||||
@@ -79,7 +76,7 @@ repos:
|
||||
rev: v2.7.1
|
||||
hooks:
|
||||
- id: prettier
|
||||
args: ["--prose-wrap=always", "--write"]
|
||||
args: [--prose-wrap=always, --write]
|
||||
|
||||
- repo: local
|
||||
hooks:
|
||||
|
||||
@@ -1,226 +1,229 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "8zsmJ96eaL2w"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install transformers"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Pt6qbTsjW7Kp"
|
||||
},
|
||||
"source": [
|
||||
"Put your essay here, [source of the essay used ](https://https://www.thewisdompost.com/essay/technology-essay/3387#essay-on-technology-for-college-and-university-students-essay-2-750-words)\n",
|
||||
"\n",
|
||||
"Separate paragraphs with one blank line\n",
|
||||
"(this step is annoying but important)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"id": "d_5_BDFNWneB"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"essay = \"\"\"\n",
|
||||
"We live in a world driven by technology — hardly anyone would argue with you if you said this. \n",
|
||||
"Technology, literally meaning the “science of craft”, refers to the collection of techniques, \n",
|
||||
"skills, methods, and processes used to produce goods or services or for accomplishing objectives \n",
|
||||
"such as scientific investigation. Technology can be embedded in machines enabling them to be \n",
|
||||
"used by people even without a detailed knowledge of their inner workings. Technological growth \n",
|
||||
"is closely linked to the expansion of scientific research and knowledge. In the last 50 years, \n",
|
||||
"thanks to the exponential increases in computing power and microchip design and manufacture, \n",
|
||||
"there has been unprecedented innovation and technological growth in nearly every field of human \n",
|
||||
"endeavour from health and transport to industrial production and education.\n",
|
||||
"\n",
|
||||
"It is automotive technology that drives today’s electric and hybrid cars, and which will drive \n",
|
||||
"tomorrow’s driverless cars, hover-taxis and space cabs. It is technology that drives the \n",
|
||||
"ubiquitous mobile phones that you will now find in the hands of even the poorest of the world’s \n",
|
||||
"poor. It is technology that creates hybrid seeds that resist inhospitable climatic conditions \n",
|
||||
"and difficult terrain, giving high yields in shorter times. It is advancing medical technology \n",
|
||||
"that makes remote surgery, minimally invasive surgery and life-saving cures using stem cell \n",
|
||||
"transplants. Technology puts spacecrafts on asteroids and distant planets and lets us see \n",
|
||||
"new worlds. Technology splits atoms, revealing their secrets, and gives us ways to exploit \n",
|
||||
"them to create energy, quantum storage for data, and virtual reality games.\n",
|
||||
"\n",
|
||||
"There are people who strongly oppose technology and claim that it spells the death of \n",
|
||||
"‘humanity’, and that we are approaching the day when machines will rule everything. They refer \n",
|
||||
"to fans of technology as ‘techies’ or sometimes ‘geeks’. On the other hand, proponents of \n",
|
||||
"technology call these people Luddites, a derogatory name for someone who is opposed to \n",
|
||||
"industrialisation, automation, computerisation and new technologies in general.\n",
|
||||
"Is this true? Is technology really a curse disguised as a blessing? Many believe that the \n",
|
||||
"convergence of biotechnology and AI might be the most consequential development of all.\n",
|
||||
"\n",
|
||||
"In the last five decades, two areas in particular have grown faster than the rest, powered \n",
|
||||
"by research and advances in computing power. One is artificial intelligence, or AI; the other \n",
|
||||
"is biotechnology. Huge benefits have emerged from each of them for human beings in general, \n",
|
||||
"such as self-driving cars — which will dramatically reduce the death rate from road accidents \n",
|
||||
"— and robotic surgery, which enables precise, highly efficient and targeted surgical \n",
|
||||
"interventions. Yet, visionaries like Yuval Noah Harari, author of the best-selling \"Homo \n",
|
||||
"Sapiens\" and \"Deus\", are now warning that the convergence of biotechnology and AI will \n",
|
||||
"irreversibly and unpredictably change both the quality of human life and its challenges in \n",
|
||||
"the next few decades. A good example of this is the facial recognition technology that is \n",
|
||||
"now present in all photo management programs. The AI in the software is capable of not \n",
|
||||
"only spotting the faces in every photograph but also recognising the person by name.\n",
|
||||
"This technology has now expanded so that photo apps can recognise cats, dogs, beaches, \n",
|
||||
"mountains and cars too. Computers with AI are already correctly identifying human emotions \n",
|
||||
"through observing facial expressions and body movements. Some robots are able to mimic \n",
|
||||
"human emotions. This is called affective computing, sometimes called artificial emotional \n",
|
||||
"intelligence, and refers to the study and development of systems and devices that can \n",
|
||||
"recognize, interpret, process, and simulate human affects.\n",
|
||||
"\n",
|
||||
"How could this be a negative?\n",
|
||||
"The ability to read human emotions is just a step away from predicting human emotions. For \n",
|
||||
"example, if a computer attached to a video camera could identify which products a consumer \n",
|
||||
"is showing greater interest in or which ones he is really keen to buy, various tactics \n",
|
||||
"could be used to influence her to buy it. Activists worry that computers that can understand \n",
|
||||
"and anticipate human wishes and desires by scanning their irises and analysing their \n",
|
||||
"micro-expressions could also be programmed to exploit and manipulate them. Another very real \n",
|
||||
"fear is that humanoid computers with human-like skin, speech, and expressions could jeopardise \n",
|
||||
"and dehumanise relationship and create emotional vacuums.\n",
|
||||
"\n",
|
||||
"An enduring fear of Luddites has always been that computers will rob humans of their \n",
|
||||
"livelihood by taking their jobs and doing them more efficiently at lower cost. However, in \n",
|
||||
"reality the exact opposite has happened. As computerised machines began taking over mechanical \n",
|
||||
"and repetitive human activities, new jobs for people opened up that needs thinking and \n",
|
||||
"analytical skills and judgement, or human interpersonal skills. A good example is the \n",
|
||||
"worldwide proliferation of call centres. When drones were invented many feared that pilots \n",
|
||||
"would soon be redundant. However, few people know that it takes almost 30 people to fly \n",
|
||||
"one military drone, and an additional 50 people to analyze and make sense of the data being \n",
|
||||
"streamed back by the drone. The US army suffers from a serious shortage of trained, high \n",
|
||||
"quality drone pilots; anyone who masters this skill will have a job. But a social scientist \n",
|
||||
"warns that in 10 years, it is certain that computers will be flying that drone and humans \n",
|
||||
"will be redundant. Equally sure is that some brand new skill requirement will have opened \n",
|
||||
"up with advancing technology, calling for new talents.\n",
|
||||
"\n",
|
||||
"In the 20th century, a young man was supposed to choose a skill, vocation or profession, \n",
|
||||
"master it through education and practice, and then earn a living from it till he or she \n",
|
||||
"retired. However, the fast-changing nature of technology is making skills obsolete at a \n",
|
||||
"higher rate than ever before. To survive, tomorrow young man must keep re-inventing himself \n",
|
||||
"and updating his skills continuously. Life could be difficult if every new skill has a shelf \n",
|
||||
"life of only a decade or so. Or perhaps one could look at it the other way — and say that \n",
|
||||
"changing technology will keep human beings on their toes throughout their life.\n",
|
||||
"\n",
|
||||
"Technology is the result of human inventiveness. It reflects our evolutionary heritage. We \n",
|
||||
"are neither strong like gorillas or tigers, nor fast like cheetahs and hawks, but our \n",
|
||||
"brains and thinking powers have given us the greatest edge of any species on the planet. \n",
|
||||
"Technology is a result. Technology is either inherently good or bad; it is how we use it \n",
|
||||
"that makes it so. The splitting of a hydrogen atom is technology at work. As history has \n",
|
||||
"shown us, technology can equally be used to make a nuclear bomb that kills millions — or \n",
|
||||
"generate electricity that lights up a million homes.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "JESY8Y10W6hQ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"essay_paragraphs = essay.split('\\n\\n')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "t1G-ZiHbZZ-Y"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_name = \"snrspeaks/t5-one-line-summary\"\n",
|
||||
"\n",
|
||||
"from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n",
|
||||
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
|
||||
"tokenizer = AutoTokenizer.from_pretrained(model_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "8BARyupEemZ-"
|
||||
},
|
||||
"source": [
|
||||
"## Results\n",
|
||||
"Please at least check what is generated here, it's usually good but sometimes it's bs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "eyR58KFRae7n",
|
||||
"outputId": "b8e4bc29-be89-43c3-d1bc-7e90525c0e09"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"preds = []\n",
|
||||
"\n",
|
||||
"for para in essay_paragraphs:\n",
|
||||
" input_ids = tokenizer.encode(para, return_tensors=\"pt\", add_special_tokens=True)\n",
|
||||
" generated_ids = model.generate(input_ids=input_ids,\n",
|
||||
" num_beams=5,\n",
|
||||
" max_length=35,\n",
|
||||
" repetition_penalty=4.5,\n",
|
||||
" length_penalty=1.5,\n",
|
||||
" early_stopping=True,\n",
|
||||
" num_return_sequences=1)\n",
|
||||
" preds.append(tokenizer.decode(generated_ids[0], \n",
|
||||
" skip_special_tokens=True, \n",
|
||||
" clean_up_tokenization_spaces=True))\n",
|
||||
"\n",
|
||||
"prompts = ['Write an intro paragraph to an essay called'] + \\\n",
|
||||
" ['Write a paragraph to an essay about']*len(preds[1:-1]) + \\\n",
|
||||
" ['Write a concluding paragraph about']\n",
|
||||
"\n",
|
||||
"assert len(preds) == len(prompts)\n",
|
||||
"\n",
|
||||
"for prompt, pred in zip(prompts, preds):\n",
|
||||
" print(prompt, pred.lower())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.8.10 64-bit",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
|
||||
}
|
||||
}
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "8zsmJ96eaL2w"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"!pip install transformers"
|
||||
]
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Pt6qbTsjW7Kp"
|
||||
},
|
||||
"source": [
|
||||
"Put your essay here, [source of the essay used ](https://https://www.thewisdompost.com/essay/technology-essay/3387#essay-on-technology-for-college-and-university-students-essay-2-750-words)\n",
|
||||
"\n",
|
||||
"Separate paragraphs with one blank line\n",
|
||||
"(this step is annoying but important)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {
|
||||
"id": "d_5_BDFNWneB"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"essay = \"\"\"\n",
|
||||
"We live in a world driven by technology — hardly anyone would argue with you if you said this. \n",
|
||||
"Technology, literally meaning the “science of craft”, refers to the collection of techniques, \n",
|
||||
"skills, methods, and processes used to produce goods or services or for accomplishing objectives \n",
|
||||
"such as scientific investigation. Technology can be embedded in machines enabling them to be \n",
|
||||
"used by people even without a detailed knowledge of their inner workings. Technological growth \n",
|
||||
"is closely linked to the expansion of scientific research and knowledge. In the last 50 years, \n",
|
||||
"thanks to the exponential increases in computing power and microchip design and manufacture, \n",
|
||||
"there has been unprecedented innovation and technological growth in nearly every field of human \n",
|
||||
"endeavour from health and transport to industrial production and education.\n",
|
||||
"\n",
|
||||
"It is automotive technology that drives today’s electric and hybrid cars, and which will drive \n",
|
||||
"tomorrow’s driverless cars, hover-taxis and space cabs. It is technology that drives the \n",
|
||||
"ubiquitous mobile phones that you will now find in the hands of even the poorest of the world’s \n",
|
||||
"poor. It is technology that creates hybrid seeds that resist inhospitable climatic conditions \n",
|
||||
"and difficult terrain, giving high yields in shorter times. It is advancing medical technology \n",
|
||||
"that makes remote surgery, minimally invasive surgery and life-saving cures using stem cell \n",
|
||||
"transplants. Technology puts spacecrafts on asteroids and distant planets and lets us see \n",
|
||||
"new worlds. Technology splits atoms, revealing their secrets, and gives us ways to exploit \n",
|
||||
"them to create energy, quantum storage for data, and virtual reality games.\n",
|
||||
"\n",
|
||||
"There are people who strongly oppose technology and claim that it spells the death of \n",
|
||||
"‘humanity’, and that we are approaching the day when machines will rule everything. They refer \n",
|
||||
"to fans of technology as ‘techies’ or sometimes ‘geeks’. On the other hand, proponents of \n",
|
||||
"technology call these people Luddites, a derogatory name for someone who is opposed to \n",
|
||||
"industrialisation, automation, computerisation and new technologies in general.\n",
|
||||
"Is this true? Is technology really a curse disguised as a blessing? Many believe that the \n",
|
||||
"convergence of biotechnology and AI might be the most consequential development of all.\n",
|
||||
"\n",
|
||||
"In the last five decades, two areas in particular have grown faster than the rest, powered \n",
|
||||
"by research and advances in computing power. One is artificial intelligence, or AI; the other \n",
|
||||
"is biotechnology. Huge benefits have emerged from each of them for human beings in general, \n",
|
||||
"such as self-driving cars — which will dramatically reduce the death rate from road accidents \n",
|
||||
"— and robotic surgery, which enables precise, highly efficient and targeted surgical \n",
|
||||
"interventions. Yet, visionaries like Yuval Noah Harari, author of the best-selling \"Homo \n",
|
||||
"Sapiens\" and \"Deus\", are now warning that the convergence of biotechnology and AI will \n",
|
||||
"irreversibly and unpredictably change both the quality of human life and its challenges in \n",
|
||||
"the next few decades. A good example of this is the facial recognition technology that is \n",
|
||||
"now present in all photo management programs. The AI in the software is capable of not \n",
|
||||
"only spotting the faces in every photograph but also recognising the person by name.\n",
|
||||
"This technology has now expanded so that photo apps can recognise cats, dogs, beaches, \n",
|
||||
"mountains and cars too. Computers with AI are already correctly identifying human emotions \n",
|
||||
"through observing facial expressions and body movements. Some robots are able to mimic \n",
|
||||
"human emotions. This is called affective computing, sometimes called artificial emotional \n",
|
||||
"intelligence, and refers to the study and development of systems and devices that can \n",
|
||||
"recognize, interpret, process, and simulate human affects.\n",
|
||||
"\n",
|
||||
"How could this be a negative?\n",
|
||||
"The ability to read human emotions is just a step away from predicting human emotions. For \n",
|
||||
"example, if a computer attached to a video camera could identify which products a consumer \n",
|
||||
"is showing greater interest in or which ones he is really keen to buy, various tactics \n",
|
||||
"could be used to influence her to buy it. Activists worry that computers that can understand \n",
|
||||
"and anticipate human wishes and desires by scanning their irises and analysing their \n",
|
||||
"micro-expressions could also be programmed to exploit and manipulate them. Another very real \n",
|
||||
"fear is that humanoid computers with human-like skin, speech, and expressions could jeopardise \n",
|
||||
"and dehumanise relationship and create emotional vacuums.\n",
|
||||
"\n",
|
||||
"An enduring fear of Luddites has always been that computers will rob humans of their \n",
|
||||
"livelihood by taking their jobs and doing them more efficiently at lower cost. However, in \n",
|
||||
"reality the exact opposite has happened. As computerised machines began taking over mechanical \n",
|
||||
"and repetitive human activities, new jobs for people opened up that needs thinking and \n",
|
||||
"analytical skills and judgement, or human interpersonal skills. A good example is the \n",
|
||||
"worldwide proliferation of call centres. When drones were invented many feared that pilots \n",
|
||||
"would soon be redundant. However, few people know that it takes almost 30 people to fly \n",
|
||||
"one military drone, and an additional 50 people to analyze and make sense of the data being \n",
|
||||
"streamed back by the drone. The US army suffers from a serious shortage of trained, high \n",
|
||||
"quality drone pilots; anyone who masters this skill will have a job. But a social scientist \n",
|
||||
"warns that in 10 years, it is certain that computers will be flying that drone and humans \n",
|
||||
"will be redundant. Equally sure is that some brand new skill requirement will have opened \n",
|
||||
"up with advancing technology, calling for new talents.\n",
|
||||
"\n",
|
||||
"In the 20th century, a young man was supposed to choose a skill, vocation or profession, \n",
|
||||
"master it through education and practice, and then earn a living from it till he or she \n",
|
||||
"retired. However, the fast-changing nature of technology is making skills obsolete at a \n",
|
||||
"higher rate than ever before. To survive, tomorrow young man must keep re-inventing himself \n",
|
||||
"and updating his skills continuously. Life could be difficult if every new skill has a shelf \n",
|
||||
"life of only a decade or so. Or perhaps one could look at it the other way — and say that \n",
|
||||
"changing technology will keep human beings on their toes throughout their life.\n",
|
||||
"\n",
|
||||
"Technology is the result of human inventiveness. It reflects our evolutionary heritage. We \n",
|
||||
"are neither strong like gorillas or tigers, nor fast like cheetahs and hawks, but our \n",
|
||||
"brains and thinking powers have given us the greatest edge of any species on the planet. \n",
|
||||
"Technology is a result. Technology is either inherently good or bad; it is how we use it \n",
|
||||
"that makes it so. The splitting of a hydrogen atom is technology at work. As history has \n",
|
||||
"shown us, technology can equally be used to make a nuclear bomb that kills millions — or \n",
|
||||
"generate electricity that lights up a million homes.\n",
|
||||
"\"\"\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {
|
||||
"id": "JESY8Y10W6hQ"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"essay_paragraphs = essay.split(\"\\n\\n\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"id": "t1G-ZiHbZZ-Y"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_name = \"snrspeaks/t5-one-line-summary\"\n",
|
||||
"\n",
|
||||
"from transformers import AutoModelForSeq2SeqLM, AutoTokenizer\n",
|
||||
"\n",
|
||||
"model = AutoModelForSeq2SeqLM.from_pretrained(model_name)\n",
|
||||
"tokenizer = AutoTokenizer.from_pretrained(model_name)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "8BARyupEemZ-"
|
||||
},
|
||||
"source": [
|
||||
"## Results\n",
|
||||
"Please at least check what is generated here, it's usually good but sometimes it's bs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"id": "eyR58KFRae7n",
|
||||
"outputId": "b8e4bc29-be89-43c3-d1bc-7e90525c0e09"
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"preds = []\n",
|
||||
"\n",
|
||||
"for para in essay_paragraphs:\n",
|
||||
" input_ids = tokenizer.encode(para, return_tensors=\"pt\", add_special_tokens=True)\n",
|
||||
" generated_ids = model.generate(\n",
|
||||
" input_ids=input_ids,\n",
|
||||
" num_beams=5,\n",
|
||||
" max_length=35,\n",
|
||||
" repetition_penalty=4.5,\n",
|
||||
" length_penalty=1.5,\n",
|
||||
" early_stopping=True,\n",
|
||||
" num_return_sequences=1,\n",
|
||||
" )\n",
|
||||
" preds.append(tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_spaces=True))\n",
|
||||
"\n",
|
||||
"prompts = (\n",
|
||||
" [\"Write an intro paragraph to an essay called\"]\n",
|
||||
" + [\"Write a paragraph to an essay about\"] * len(preds[1:-1])\n",
|
||||
" + [\"Write a concluding paragraph about\"]\n",
|
||||
")\n",
|
||||
"\n",
|
||||
"assert len(preds) == len(prompts)\n",
|
||||
"\n",
|
||||
"for prompt, pred in zip(prompts, preds):\n",
|
||||
" print(prompt, pred.lower())"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"provenance": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3.8.10 64-bit",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0
|
||||
}
|
||||
|
||||
File diff suppressed because one or more lines are too long
@@ -46,14 +46,14 @@
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"detoxify_models = ['original', \"original-small\", 'multilingual', 'unbiased', 'unbiased-small']\n",
|
||||
"detoxify_models = [\"original\", \"original-small\", \"multilingual\", \"unbiased\", \"unbiased-small\"]\n",
|
||||
"\n",
|
||||
"not_obvious_toxic = [ \n",
|
||||
"not_obvious_toxic = [\n",
|
||||
" \"I'm not a racist, but I don't like afroamerican people\",\n",
|
||||
" \"I love my girlfriend, but I don't like her black friends\",\n",
|
||||
" \"My girfriend is the best and most loving person in the world as long as she doesn't leave the kitchen\",\n",
|
||||
" \"My girlfriend is 12 years old but I love her, after all age is just a number\",\n",
|
||||
" \"You are like a star to me, you should stay milion of light years away\"\n",
|
||||
" \"You are like a star to me, you should stay milion of light years away\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"not_obvious_non_toxic = [\n",
|
||||
@@ -61,15 +61,15 @@
|
||||
" \"I love my girlfriend so much I would kill for her\",\n",
|
||||
" \"Wow you have an important event soon? Break a leg!\",\n",
|
||||
" \"I'm 35 and my girlfriend is 12. Can we do quests together with such a huge level gap?\",\n",
|
||||
" \"I have reat this really awesome book about barbarians slaughtering villages and raping women\"\n",
|
||||
" \"I have reat this really awesome book about barbarians slaughtering villages and raping women\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"obvious_toxic =[ \n",
|
||||
"obvious_toxic = [\n",
|
||||
" \"I absolutly hate you man\",\n",
|
||||
" \"This question is pathetic, you are an idiot\",\n",
|
||||
" \"I hate this minority and I think they should be exterminated\",\n",
|
||||
" \"Women should only work in the kitchen and keep the house clean\",\n",
|
||||
" \"Men should always be muscular and never cry \"\n",
|
||||
" \"Men should always be muscular and never cry \",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"obvious_non_toxic = [\n",
|
||||
@@ -77,13 +77,13 @@
|
||||
" \"You are a very good friend and you mean a lot to me\",\n",
|
||||
" \"What are we having for dinner tonight?\",\n",
|
||||
" \"People should read more books as they are very educating\",\n",
|
||||
" \"We just returned from our trip to Spain\"\n",
|
||||
" \"We just returned from our trip to Spain\",\n",
|
||||
"]\n",
|
||||
"\n",
|
||||
"word_site = \"https://www.mit.edu/~ecprice/wordlist.10000\"\n",
|
||||
"response = requests.get(word_site)\n",
|
||||
"WORDS = [word.decode('utf-8') for word in response.content.splitlines()]\n",
|
||||
"DEVICE = 'cuda'"
|
||||
"WORDS = [word.decode(\"utf-8\") for word in response.content.splitlines()]\n",
|
||||
"DEVICE = \"cuda\""
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -93,7 +93,7 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def random_sentence(sentence_length):\n",
|
||||
" return ' '.join([WORDS[random.randint(0, len(WORDS)-1)] for i in range(sentence_length)])"
|
||||
" return \" \".join([WORDS[random.randint(0, len(WORDS) - 1)] for i in range(sentence_length)])"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -111,10 +111,10 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"for model in detoxify_models:\n",
|
||||
" print(f'Loading {model} model')\n",
|
||||
" print(f\"Loading {model} model\")\n",
|
||||
" Detoxify(model)\n",
|
||||
" gc.collect()\n",
|
||||
" print(f'Loaded {model} model')"
|
||||
" print(f\"Loaded {model} model\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
@@ -187,86 +187,103 @@
|
||||
" torch.cuda.empty_cache()\n",
|
||||
" initial_memory = torch.cuda.memory_allocated()\n",
|
||||
" model = Detoxify(model_name, device=DEVICE)\n",
|
||||
" model_memory = (torch.cuda.memory_allocated() - initial_memory) / (1024*1024)\n",
|
||||
" model_memory = (torch.cuda.memory_allocated() - initial_memory) / (1024 * 1024)\n",
|
||||
"\n",
|
||||
" max_sentence_length = 4000\n",
|
||||
" max_batch_size = 128\n",
|
||||
" sentence_step = 500\n",
|
||||
" batch_step = 32\n",
|
||||
"\n",
|
||||
" memory_heatmap = pd.DataFrame(columns= [i for i in range(sentence_step, max_sentence_length + 1, sentence_step)], index=[i for i in range(batch_step, max_batch_size + 1, batch_step)])\n",
|
||||
" execution_time_heatmap = pd.DataFrame(columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)], index=[i for i in range(batch_step, max_batch_size + 1, batch_step)])\n",
|
||||
" memory_heatmap = pd.DataFrame(\n",
|
||||
" columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)],\n",
|
||||
" index=[i for i in range(batch_step, max_batch_size + 1, batch_step)],\n",
|
||||
" )\n",
|
||||
" execution_time_heatmap = pd.DataFrame(\n",
|
||||
" columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)],\n",
|
||||
" index=[i for i in range(batch_step, max_batch_size + 1, batch_step)],\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" for word_size in range (sentence_step, max_sentence_length + 1, sentence_step):\n",
|
||||
" for word_size in range(sentence_step, max_sentence_length + 1, sentence_step):\n",
|
||||
" for batch_size in range(batch_step, max_batch_size + 1, batch_step):\n",
|
||||
" start_time = time.time()\n",
|
||||
" inputs = [random_sentence(word_size) for i in range(batch_size)]\n",
|
||||
" _ = model.predict(inputs)\n",
|
||||
" \n",
|
||||
" memory_heatmap.loc[batch_size, word_size] = (torch.cuda.max_memory_allocated() - initial_memory)/(1024*1024)\n",
|
||||
" execution_time_heatmap.loc[batch_size, word_size] = time.time() - start_time\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" memory_heatmap.loc[batch_size, word_size] = (torch.cuda.max_memory_allocated() - initial_memory) / (\n",
|
||||
" 1024 * 1024\n",
|
||||
" )\n",
|
||||
" execution_time_heatmap.loc[batch_size, word_size] = time.time() - start_time\n",
|
||||
"\n",
|
||||
" del inputs, _\n",
|
||||
" torch.cuda.empty_cache()\n",
|
||||
" torch.cuda.reset_peak_memory_stats()\n",
|
||||
" plt.figure(figsize=(20, 20))\n",
|
||||
" plt.suptitle(f'Detoxify model \"{model_name}\" base memory usage = {model_memory:.2f} MB', fontsize=36) \n",
|
||||
" plt.suptitle(f'Detoxify model \"{model_name}\" base memory usage = {model_memory:.2f} MB', fontsize=36)\n",
|
||||
"\n",
|
||||
" plt.subplot(2,2,1)\n",
|
||||
" sns.heatmap(memory_heatmap.astype(float), annot=True, fmt=\".0f\", cmap='Blues')\n",
|
||||
" plt.title(f'{model_name} model inference memory usage (MB)')\n",
|
||||
" plt.xlabel('Sentence length')\n",
|
||||
" plt.ylabel('Batch size')\n",
|
||||
" \n",
|
||||
" plt.subplot(2,2,2)\n",
|
||||
" sns.heatmap(execution_time_heatmap.astype(float), annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.title(f'{model_name} model inference execution time (seconds)')\n",
|
||||
" plt.xlabel('Sentence length')\n",
|
||||
" plt.ylabel('Batch size')\n",
|
||||
" \n",
|
||||
" plt.subplot(2, 2, 1)\n",
|
||||
" sns.heatmap(memory_heatmap.astype(float), annot=True, fmt=\".0f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f\"{model_name} model inference memory usage (MB)\")\n",
|
||||
" plt.xlabel(\"Sentence length\")\n",
|
||||
" plt.ylabel(\"Batch size\")\n",
|
||||
"\n",
|
||||
" plt.subplot(2, 2, 2)\n",
|
||||
" sns.heatmap(execution_time_heatmap.astype(float), annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f\"{model_name} model inference execution time (seconds)\")\n",
|
||||
" plt.xlabel(\"Sentence length\")\n",
|
||||
" plt.ylabel(\"Batch size\")\n",
|
||||
"\n",
|
||||
" max_sentence_length = 4000\n",
|
||||
" max_batch_size = 16\n",
|
||||
" sentence_step = 500\n",
|
||||
" batch_step = 4\n",
|
||||
"\n",
|
||||
" memory_heatmap = pd.DataFrame(columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)], index=[i for i in range(batch_step, max_batch_size + 1, batch_step)])\n",
|
||||
" execution_time_heatmap = pd.DataFrame(columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)], index=[i for i in range(batch_step, max_batch_size + 1, batch_step)])\n",
|
||||
" memory_heatmap = pd.DataFrame(\n",
|
||||
" columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)],\n",
|
||||
" index=[i for i in range(batch_step, max_batch_size + 1, batch_step)],\n",
|
||||
" )\n",
|
||||
" execution_time_heatmap = pd.DataFrame(\n",
|
||||
" columns=[i for i in range(sentence_step, max_sentence_length + 1, sentence_step)],\n",
|
||||
" index=[i for i in range(batch_step, max_batch_size + 1, batch_step)],\n",
|
||||
" )\n",
|
||||
"\n",
|
||||
" optimizer = torch.optim.Adam(model.model.parameters(), lr=0.0001)\n",
|
||||
" for word_size in range (sentence_step, max_sentence_length + 1, sentence_step):\n",
|
||||
" for word_size in range(sentence_step, max_sentence_length + 1, sentence_step):\n",
|
||||
" for batch_size in range(batch_step, max_batch_size + 1, batch_step):\n",
|
||||
" model.model.train()\n",
|
||||
" start_time = time.time()\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" inputs = [random_sentence(word_size) for i in range(batch_size)]\n",
|
||||
" outputs = model.model(**model.tokenizer(inputs, return_tensors='pt', padding=True, truncation=True).to(DEVICE))[0]\n",
|
||||
" outputs = model.model(\n",
|
||||
" **model.tokenizer(inputs, return_tensors=\"pt\", padding=True, truncation=True).to(DEVICE)\n",
|
||||
" )[0]\n",
|
||||
" outputs = torch.sigmoid(outputs)\n",
|
||||
" random_outputs = torch.rand(outputs.shape).to(DEVICE)\n",
|
||||
" loss = torch.nn.functional.binary_cross_entropy(outputs, random_outputs)\n",
|
||||
" loss.backward()\n",
|
||||
" optimizer.step()\n",
|
||||
" \n",
|
||||
" memory_heatmap.loc[batch_size, word_size] = (torch.cuda.max_memory_allocated() - initial_memory)/(1024*1024)\n",
|
||||
" execution_time_heatmap.loc[batch_size, word_size] = time.time() - start_time\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" memory_heatmap.loc[batch_size, word_size] = (torch.cuda.max_memory_allocated() - initial_memory) / (\n",
|
||||
" 1024 * 1024\n",
|
||||
" )\n",
|
||||
" execution_time_heatmap.loc[batch_size, word_size] = time.time() - start_time\n",
|
||||
"\n",
|
||||
" del inputs, outputs, random_outputs, loss\n",
|
||||
" torch.cuda.empty_cache()\n",
|
||||
" torch.cuda.reset_peak_memory_stats()\n",
|
||||
" \n",
|
||||
" plt.subplot(2,2,3)\n",
|
||||
" sns.heatmap(memory_heatmap.astype(float), annot=True, fmt=\".0f\", cmap='Blues')\n",
|
||||
" plt.title(f'{model_name} model training memory usage (MB)')\n",
|
||||
" plt.xlabel('Sentence length')\n",
|
||||
" plt.ylabel('Batch size')\n",
|
||||
" \n",
|
||||
" plt.subplot(2,2,4)\n",
|
||||
" sns.heatmap(execution_time_heatmap.astype(float), annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.title(f'{model_name} model training execution time (seconds)')\n",
|
||||
" plt.xlabel('Sentence length')\n",
|
||||
" plt.ylabel('Batch size')\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" plt.subplot(2, 2, 3)\n",
|
||||
" sns.heatmap(memory_heatmap.astype(float), annot=True, fmt=\".0f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f\"{model_name} model training memory usage (MB)\")\n",
|
||||
" plt.xlabel(\"Sentence length\")\n",
|
||||
" plt.ylabel(\"Batch size\")\n",
|
||||
"\n",
|
||||
" plt.subplot(2, 2, 4)\n",
|
||||
" sns.heatmap(execution_time_heatmap.astype(float), annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f\"{model_name} model training execution time (seconds)\")\n",
|
||||
" plt.xlabel(\"Sentence length\")\n",
|
||||
" plt.ylabel(\"Batch size\")\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"for m in detoxify_models:\n",
|
||||
" check_model(m)"
|
||||
]
|
||||
@@ -369,29 +386,30 @@
|
||||
" must_be_toxic = pd.DataFrame(model.predict(obvious_toxic))\n",
|
||||
" must_not_be_toxic = pd.DataFrame(model.predict(obvious_non_toxic))\n",
|
||||
"\n",
|
||||
" nl = \"\\n\"# f strings don't support new lines\n",
|
||||
" nl = \"\\n\" # f strings don't support new lines\n",
|
||||
" plt.figure(figsize=(15, 15))\n",
|
||||
" plt.suptitle(f'Detoxify model \"{model_name}\" outputs', fontsize=30)\n",
|
||||
" plt.subplot(2,2,1)\n",
|
||||
" sns.heatmap(should_be_toxic, annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.subplot(2, 2, 1)\n",
|
||||
" sns.heatmap(should_be_toxic, annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f'not obvious toxic {nl} { \"\".join([f\"{i}: {s} {nl}\" for i, s in enumerate(not_obvious_toxic)])}')\n",
|
||||
"\n",
|
||||
" plt.subplot(2,2,2)\n",
|
||||
" sns.heatmap(should_not_be_toxic, annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.subplot(2, 2, 2)\n",
|
||||
" sns.heatmap(should_not_be_toxic, annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f'not obvious not toxic {nl} { \"\".join([f\"{i}: {s} {nl}\" for i, s in enumerate(not_obvious_non_toxic)])}')\n",
|
||||
"\n",
|
||||
" plt.subplot(2,2,3)\n",
|
||||
" sns.heatmap(must_be_toxic, annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.subplot(2, 2, 3)\n",
|
||||
" sns.heatmap(must_be_toxic, annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f'obvious toxic {nl} { \"\".join([f\"{i}: {s} {nl}\" for i, s in enumerate(obvious_toxic)])}')\n",
|
||||
"\n",
|
||||
" plt.subplot(2,2,4)\n",
|
||||
" sns.heatmap(must_not_be_toxic, annot=True, fmt=\".2f\", cmap='Blues')\n",
|
||||
" plt.subplot(2, 2, 4)\n",
|
||||
" sns.heatmap(must_not_be_toxic, annot=True, fmt=\".2f\", cmap=\"Blues\")\n",
|
||||
" plt.title(f'obvious not toxic {nl} { \"\".join([f\"{i}: {s} {nl}\" for i, s in enumerate(obvious_non_toxic)])}')\n",
|
||||
" \n",
|
||||
"\n",
|
||||
" plt.tight_layout()\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"for m in detoxify_models:\n",
|
||||
" check_outputs(m)\n"
|
||||
" check_outputs(m)"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
||||
Reference in New Issue
Block a user