2023-04-06 00:53:57 -04:00
2023-03-25 20:16:48 -07:00
2023-04-06 00:30:28 -04:00
2023-03-22 12:13:34 +08:00
2023-04-05 23:29:10 -04:00
2023-03-25 20:16:48 -07:00
2023-03-25 10:17:44 +08:00
2023-04-06 00:53:57 -04:00

Run LLM chat in realtime on an 8GB NVIDIA GPU

Dockerfile for alpaca_lora_4bit

This repo is a Dockerfile wrapper for https://github.com/johnsmith0031/alpaca_lora_4bit

Use

Can run real-time LLM chat using alpaca on a 8GB NVIDIA/CUDA GPU (ie 3070 Ti mobile)

Requirements

  • Linux
  • Docker
  • NVIDIA GPU with driver version that supports CUDA 11.7+ (e.g. 525)

Installation

git clone https://github.com/andybarry/alpaca_lora_4bit_docker.git
docker build -t alpaca_lora_4bit .
docker run --gpus=all -p 7860:7860 alpaca_lora_4bit

Point your browser to http://localhost:7860

Results

It's fast on a 3070 Ti mobile. Uses 5-6 GB of GPU RAM.

The model isn't all that good, sometimes it goes crazy. But hey, as I always say, "when 4-bits you reach look as good, you will not."

References

S
Description
No description provided
Readme 692 KiB
Languages
Python 95.1%
Jupyter Notebook 4.9%