Commit Graph

54 Commits

Author SHA1 Message Date
Lalit Pradhan 4c07dd28c0 [🚀 Ready to be merged] Added support for Jais models (#3183) 2024-03-21 09:45:24 +00:00
Simon Mo ef65dcfa6f [Doc] Add docs about OpenAI compatible server (#3288) 2024-03-18 22:05:34 -07:00
Philipp Moritz 657061fdce [docs] Add LoRA support information for models (#3299) 2024-03-11 00:54:51 -07:00
Sage Moore ce4f5a29fb Add Automatic Prefix Caching (#2762)
Co-authored-by: ElizaWszola <eliza@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-03-02 00:50:01 -08:00
Ganesh Jagadeesan a8683102cc multi-lora documentation fix (#3064) 2024-02-27 21:26:15 -08:00
Woosuk Kwon 8b430d7dea [Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046) 2024-02-26 20:23:50 -08:00
张大成 48a8f4a7fd Support Orion model (#2539)
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-02-26 19:17:06 -08:00
Zhuohan Li a9c8212895 [FIX] Add Gemma model to the doc (#2966) 2024-02-21 09:46:15 -08:00
Isotr0py ab3a5a8259 Support OLMo models. (#2832) 2024-02-18 21:05:15 -08:00
jvmncs 8f36444c4f multi-LoRA as extra models in OpenAI server (#2775)
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
 --model meta-llama/Llama-2-7b-hf \
 --enable-lora \
 --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs

no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
Philipp Moritz 317b29de0f Remove Yi model definition, please use LlamaForCausalLM instead (#2854)
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Philipp Moritz 4ca2c358b1 Add documentation section about LoRA (#2834) 2024-02-12 17:24:45 +01:00
Fengzhe Zhou cd9e60c76c Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Junyang Lin 2832e7b9f9 fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
LastWhisper 223c19224b Fix the syntax error in the doc of supported_models (#2584) 2024-01-24 11:22:51 -08:00
Junyang Lin 94b5edeb53 Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
Hyunsung Lee e1957c6ebd Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Zhuohan Li fd4ea8ef5c Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Ronen Schaffer c17daa9f89 [Docs] Fix broken links (#2222) 2023-12-20 12:43:42 -08:00
avideci de60a3fb93 Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
Suhong Moon 3ec8c25cd0 [Docs] Update documentation for gpu-memory-utilization option (#2162) 2023-12-17 10:51:57 -08:00
Woosuk Kwon f8c688d746 [Minor] Add Phi 2 to supported models (#2159) 2023-12-17 02:54:57 -08:00
Antoni Baum 21d93c140d Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon 096827c284 [Docs] Add notes on ROCm-supported models (#2087) 2023-12-13 09:45:34 -08:00
Woosuk Kwon 4ff0203987 Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Peter Götz d940ce497e Fix typo in adding_model.rst (#1947)
adpated -> adapted
2023-12-06 10:04:26 -08:00
Woosuk Kwon e5452ddfd6 Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Simon Mo 0f621c2c7d [Docs] Add information about using shared memory in docker (#1845) 2023-11-29 18:33:56 -08:00
Casper a921d8be9d [DOCS] Add engine args documentation (#1741) 2023-11-22 12:31:27 -08:00
liuyhwangyh edb305584b Support download models from www.modelscope.cn (#1588) 2023-11-17 20:38:31 -08:00
Zhuohan Li 0fc280b06c Update the adding-model doc according to the new refactor (#1692) 2023-11-16 18:46:26 -08:00
Zhuohan Li 415d109527 [Fix] Update Supported Models List (#1690) 2023-11-16 14:47:26 -08:00
Usama Ahmed 0967102c6d fixing typo in tiiuae/falcon-rw-7b model name (#1226) 2023-09-29 13:40:25 -07:00
Woosuk Kwon 202351d5bf Add Mistral to supported model list (#1221) 2023-09-28 14:33:04 -07:00
Zhuohan Li 002800f081 Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Woosuk Kwon 55b28b1eee [Docs] Minor fixes in supported models (#920)
* Minor fix in supported models

* Add another small fix for Aquila model

---------

Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-08-31 16:28:39 -07:00
Zhuohan Li 14f9c72bfd Update Supported Model List (#825) 2023-08-22 11:51:44 -07:00
Uranus 1b151ed181 Fix baichuan doc style (#748) 2023-08-13 20:57:31 -07:00
Zhuohan Li f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00
Zhuohan Li 1b0bd0fe8a Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Zhuohan Li df5dd3c68e Add Baichuan-7B to README (#494) 2023-07-25 15:25:12 -07:00
Zhuohan Li 6fc2a38b11 Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Andre Slavescu c894836108 [Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon ffa6d2f9f9 [Docs] Fix typo (#346) 2023-07-03 16:51:47 -07:00
Woosuk Kwon 404422f42e [Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon e41f06702c Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Woosuk Kwon 665c48963b [Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Woosuk Kwon 794e578de0 [Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon b7e62d3454 Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Zhuohan Li 0b32a987dd Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00