wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-07-01 03:10:14 +08:00

Author	SHA1	Message	Date
Simon Mo	ef65dcfa6f	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
Philipp Moritz	657061fdce	[docs] Add LoRA support information for models (#3299 )	2024-03-11 00:54:51 -07:00
Ganesh Jagadeesan	a8683102cc	multi-lora documentation fix (#3064 )	2024-02-27 21:26:15 -08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Philipp Moritz	4ca2c358b1	Add documentation section about LoRA (#2834 )	2024-02-12 17:24:45 +01:00