2023-04-15 13:11:15 +00:00
2023-04-15 06:23:33 +00:00
2023-04-15 13:11:15 +00:00
2023-04-15 13:11:15 +00:00
2023-04-10 19:47:53 +08:00
2023-04-10 16:15:52 +08:00
2023-04-15 06:23:33 +00:00
2023-04-15 13:11:15 +00:00
2023-04-10 19:47:53 +08:00
2023-04-15 06:23:33 +00:00
2023-04-10 19:47:53 +08:00

My personal repo to convert models from Lora to huggingface/ggml/gptq 4bit so I can run them in normal text-webui and llama.cpp

How do we do this?

  1. lora -> hf
  2. hf -> 4bit
    • using GPTQ-for-LLaMa/llama.py CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-hf/llama-7b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save llama7b-4bit-128g.pt
  3. 4bit -> ggml

TODO

  • lora -> hf
    • test this
  • hf -> 4bit
  • 4bit to -> ggml

setup env


conda create -n textgen3 python=3.10.9
conda activate textgen3
mamba install pytorch torchvision torchaudio pytorch-cuda=11.7 cudatoolkit-dev==11.7  cudatoolkit=11.7 -c pytorch -c nvidia  -c conda-forge 
pip install -r requirements.txt
pip install -e .

download models

# # base models.... FIXME


# download loras
python scripts/download-model.py chansung/alpaca-lora-30b
python scripts/download-model.py chansung/alpaca-lora-13b
python scripts/download-model.py tloen/alpaca-lora-7b

convert models

# download
python scripts/download-model.py tloen/alpaca-lora-7b
python scripts/download-model.py decapoda-research/llama-7b-hf
# convert
python scripts/export_hf_checkpoint.py ./models/llama-7b-hf -l loras/tloen_alpaca-lora-7b
# test
python scripts/test_01_delora.py models/tloen_alpaca-lora-7b-delorified

Links

S
Description
No description provided
Readme 693 KiB
Languages
Python 95.1%
Jupyter Notebook 4.9%