Commit Graph

7 Commits

Author SHA1 Message Date
felixzhu555 703e42ee4b Add guided decoding for OpenAI API server (#2819)
Co-authored-by: br3no <breno@veltefaria.de>
Co-authored-by: simon-mo <simon.mo@hey.com>
2024-02-29 22:13:08 +00:00
Dylan Hawk e0ade06d63 Support logit bias for OpenAI API (#3027) 2024-02-27 11:51:53 +08:00
Jared Moore 70f3e8e3a1 Add LogProbs for Chat Completions in OpenAI (#2918) 2024-02-26 10:39:34 +08:00
jvmncs 8f36444c4f multi-LoRA as extra models in OpenAI server (#2775)
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Simon Mo 3a7dd7e367 Support Batch Completion in Server (#2529) 2024-01-24 17:11:07 -08:00
Simon Mo dd7e8f5f64 refactor complemention api for readability (#2499) 2024-01-18 16:45:14 -08:00
FlorianJoncour 14cc317ba4 OpenAI Server refactoring (#2360) 2024-01-16 21:33:14 -08:00