wassname/alpaca_convert

mirror of https://github.com/wassname/alpaca_convert.git synced 2026-06-27 16:14:08 +08:00

T

Andy Barry 3a95ad894b Update README.md

2023-04-06 00:53:57 -04:00

text-generation-webui

Add dockerfile and change some numbers to use 7bn model.

2023-04-05 23:13:35 -04:00

.gitignore

Fix repos.

2023-03-25 20:16:48 -07:00

alpaca_lora_4bit_penguin_fact.gif

Add gif.

2023-04-06 00:30:28 -04:00

amp_wrapper.py

add amp_wrapper for autocast support.

2023-03-30 19:57:19 +08:00

arg_parser.py

add g_idx buffer.\nadd triton matmul utils for future support.

2023-04-02 21:29:06 +08:00

autograd_4bit.py

add g_idx buffer.\nadd triton matmul utils for future support.

2023-04-02 21:29:06 +08:00

data.txt

add data

2023-03-22 12:13:34 +08:00

Dockerfile

Fix some issues.

2023-04-05 23:29:10 -04:00

Finetune4bConfig.py

better multi-gpu support, support gpt4all training data

2023-03-29 11:21:47 -04:00

finetune.py

update multi gpu support in finetune.py

2023-04-03 23:55:58 +08:00

gradient_checkpointing.py

Fix repos.

2023-03-25 20:16:48 -07:00

inference.py

add amp_wrapper for autocast support.

2023-03-30 19:57:19 +08:00

LICENSE

Create LICENSE

2023-03-25 10:17:44 +08:00

matmul_utils_4bit.py

fix gpt4all training to more closely match the released logic, other small fixes and optimizations

2023-03-30 22:40:40 -04:00

README.md

Update README.md

2023-04-06 00:53:57 -04:00

requirements2.txt

Add dockerfile and change some numbers to use 7bn model.

2023-04-05 23:13:35 -04:00

requirements.txt

Add dockerfile and change some numbers to use 7bn model.

2023-04-05 23:13:35 -04:00

train_data.py

fix gpt4all training to more closely match the released logic, other small fixes and optimizations

2023-03-30 22:40:40 -04:00

triton_utils.py

add g_idx buffer.\nadd triton matmul utils for future support.

2023-04-02 21:29:06 +08:00

README.md

Run LLM chat in realtime on an 8GB NVIDIA GPU

Dockerfile for alpaca_lora_4bit

This repo is a Dockerfile wrapper for https://github.com/johnsmith0031/alpaca_lora_4bit

Use

Can run real-time LLM chat using alpaca on a 8GB NVIDIA/CUDA GPU (ie 3070 Ti mobile)

Requirements

Linux
Docker
NVIDIA GPU with driver version that supports CUDA 11.7+ (e.g. 525)

Installation

git clone https://github.com/andybarry/alpaca_lora_4bit_docker.git
docker build -t alpaca_lora_4bit .
docker run --gpus=all -p 7860:7860 alpaca_lora_4bit

Point your browser to http://localhost:7860

Results

It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM.

The model isn't all that good, sometimes it goes crazy. But hey, as I always say, "when 4-bits you reach look as good, you will not."

References