wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-06-29 12:09:14 +08:00

Author	SHA1	Message	Date
Breno Faria	87d41c849d	[BUGFIX] [FRONTEND] Correct chat logprobs (#5029 ) Co-authored-by: Breno Faria <breno.faria@intrafind.com>	2024-05-30 02:52:14 -07:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Alex Wu	52f8107cf2	[Frontend] Support OpenAI batch file format (#4794 ) Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-15 19:13:36 -04:00
Cyrus Leung	fc0d9dfc3a	[Frontend] Re-enable custom roles in Chat Completions API (#4758 )	2024-05-15 14:58:46 -07:00
Cyrus Leung	350f9e107f	[CI/Build] Move `test_utils.py` to `tests/utils.py` (#4425 ) Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time) Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.	2024-05-13 23:50:09 +09:00
Chang Su	e254497b66	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
Cyrus Leung	f12b20decc	[Frontend] Move async logic outside of constructor (#4674 )	2024-05-08 22:48:33 -07:00
Sebastian Schoennenbeck	f8e7adda21	Fix/async chat serving (#2727 )	2024-05-03 11:04:14 -07:00
sasha0552	c47ba4aaa9	[Bugfix] Add validation for seed (#4529 )	2024-05-01 19:31:22 +00:00
Robert Caulk	c3845d82dc	Allow user to define whitespace pattern for outlines (#4305 )	2024-04-30 20:48:39 -07:00
Florian Greinacher	a494140433	[Frontend] Support complex message content for chat completions endpoint (#3467 ) Co-authored-by: Lily Liu <lilyliupku@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-04-30 16:28:46 -07:00
Cyrus Leung	8947bc3c15	[Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355 )	2024-04-27 05:08:24 +00:00
nunjunj	91528575ec	[Frontend] multiple sampling params support (#3570 )	2024-04-20 00:11:57 -07:00
Ayush Rautwar	138485a82d	[Bugfix] Add fix for JSON whitespace (#4189 ) Co-authored-by: Ubuntu <ubuntu@ip-172-31-13-147.ec2.internal>	2024-04-19 20:49:22 -07:00
James Whedbee	e1bb2fd52d	[Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149 )	2024-04-18 21:12:55 +00:00
Noam Gat	05434764cd	LM Format Enforcer Guided Decoding Support (#3868 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-16 05:54:57 +00:00
Dylan Hawk	95e7d4a97c	Fix echo/logprob OpenAI completion bug (#3441 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-04-11 22:15:50 +00:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
Roy	f510395bbf	[BugFix][Frontend] Fix completion logprobs=0 error (#3731 )	2024-03-29 09:38:21 -07:00
Dylan Hawk	0b4997e05c	[Bugfix] API stream returning two stops (#3450 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-03-25 10:14:34 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Simon Mo	120157fd2a	Support arbitrary json_object in OpenAI and Context Free Grammar (#3211 )	2024-03-16 13:35:27 -07:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Antoni Baum	22de45235c	Push logprob generation to LLMEngine (#3065 ) Co-authored-by: Avnish Narayan <avnish@anyscale.com>	2024-03-04 19:54:06 +00:00
felixzhu555	703e42ee4b	Add guided decoding for OpenAI API server (#2819 ) Co-authored-by: br3no <breno@veltefaria.de> Co-authored-by: simon-mo <simon.mo@hey.com>	2024-02-29 22:13:08 +00:00
Dylan Hawk	e0ade06d63	Support logit bias for OpenAI API (#3027 )	2024-02-27 11:51:53 +08:00
Jared Moore	70f3e8e3a1	Add LogProbs for Chat Completions in OpenAI (#2918 )	2024-02-26 10:39:34 +08:00
jvmncs	8f36444c4f	multi-LoRA as extra models in OpenAI server (#2775 ) how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py)): ```terminal $ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ $ python -m vllm.entrypoints.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH ``` the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs no work has been done here to scope client permissions to specific models	2024-02-17 12:00:48 -08:00
Simon Mo	3a7dd7e367	Support Batch Completion in Server (#2529 )	2024-01-24 17:11:07 -08:00
Simon Mo	dd7e8f5f64	refactor complemention api for readability (#2499 )	2024-01-18 16:45:14 -08:00
FlorianJoncour	14cc317ba4	OpenAI Server refactoring (#2360 )	2024-01-16 21:33:14 -08:00

32 Commits