wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-06-30 10:16:32 +08:00

Author	SHA1	Message	Date
Konrad Zawora	a02a50e6e5	[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143 ) Signed-off-by: yuwenzho <yuwen.zhou@intel.com> Signed-off-by: Chendi.Xue <chendi.xue@intel.com> Signed-off-by: Bob Zhu <bob.zhu@intel.com> Signed-off-by: zehao-intel <zehao.huang@intel.com> Signed-off-by: Konrad Zawora <kzawora@habana.ai> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai> Co-authored-by: Michal Adamczyk <madamczyk@habana.ai> Co-authored-by: Marceli Fylcek <mfylcek@habana.ai> Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com> Co-authored-by: Vivek Goel <vgoel@habana.ai> Co-authored-by: yuwenzho <yuwen.zhou@intel.com> Co-authored-by: Dominika Olszewska <dolszewska@habana.ai> Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com> Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com> Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai> Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com> Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai> Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com> Co-authored-by: Ilia Taraban <tarabanil@gmail.com> Co-authored-by: Chendi.Xue <chendi.xue@intel.com> Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai> Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai> Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com> Co-authored-by: Sun Choi <schoi@habana.ai> Co-authored-by: Iryna Boiko <iboiko@habana.ai> Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com> Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com> Co-authored-by: Zehao Huang <zehao.huang@intel.com> Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com> Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com> Co-authored-by: Nir David <ndavid@habana.ai> Co-authored-by: Yu-Zhou <yu.zhou@intel.com> Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai> Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Jacek Czaja <jacek.czaja@intel.com> Co-authored-by: Jacek Czaja <jczaja@habana.ai> Co-authored-by: Yuan <yuan.zhou@outlook.com>	2024-11-06 01:09:10 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Gene Der Su	7a83b1aec0	[BugFix] Lazy import ray (#10021 )	2024-11-05 10:04:10 +00:00
Cyrus Leung	bbc3619dc8	[Core] Make encoder-decoder inputs a nested structure to be more composable (#9604 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-05 10:07:31 +08:00
tomeras91	ac04a97a9f	[Frontend] Add max_tokens prometheus metric (#9881 ) Signed-off-by: Tomer Asida <tomera@ai21.com>	2024-11-04 22:53:24 +00:00
Robert Shaw	04cef2c6ab	[Bugfix] Fix `MQLLMEngine` hanging (#9973 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-11-04 16:01:43 -05:00
Chauncey	ac6b8f19b9	[Frontend] Multi-Modality Support for Loading Local Image Files (#9915 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2024-11-04 15:34:57 +00:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Gene Der Su	27cd36e6e2	[Bugfix] PicklingError on RayTaskError (#9934 ) Signed-off-by: Gene Su <e870252314@gmail.com>	2024-11-01 22:08:23 +00:00
youkaichao	18bd7587b7	[1/N] pass the complete config from engine to executor (#9933 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-01 13:51:57 -07:00
Joe Runde	031a7995f3	[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-01 01:09:46 +00:00
Roger Wang	3ea2dc2ec4	[Misc] Remove deprecated arg for cuda graph capture (#9864 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-10-31 07:22:07 +00:00
Joe Runde	3b3f1e7436	[Bugfix][core] replace heartbeat with pid check (#9818 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-30 09:34:07 -07:00
Went-Liang	81f09cfd80	[Model] Support math-shepherd-mistral-7b-prm model (#9697 ) Signed-off-by: Went-Liang <wenteng_liang@163.com>	2024-10-30 09:33:42 -07:00
Joe Runde	67bdf8e523	[Bugfix][Frontend] Guard against bad token ids (#9634 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-29 14:13:20 -07:00
Kunjan	0ad216f575	[MISC] Set label value to timestamp over 0, to keep track of recent history (#9777 ) Signed-off-by: Kunjan Patel <kunjanp@google.com>	2024-10-29 19:52:19 +00:00
科英	74fc2d77ae	[Misc] Add metrics for request queue time, forward time, and execute time (#9659 )	2024-10-29 10:32:56 -07:00
Zhong Qishuai	ef7865b4f9	[Frontend] re-enable multi-modality input in the new beam search implementation (#9427 ) Signed-off-by: Qishuai Ferdinandzhong@gmail.com	2024-10-29 11:49:47 +00:00
Cyrus Leung	e74f2d448c	[Doc] Specify async engine args in docs (#9726 )	2024-10-28 22:07:57 -07:00
Robert Shaw	feb92fbe4a	Fix beam search eos (#9627 )	2024-10-28 06:59:37 +00:00
madt2709	34a9941620	[Bugfix] Fix load config when using bools (#9533 )	2024-10-27 13:46:41 -04:00
Vasiliy Alekseev	07e981fdf4	[Frontend] Bad words sampling parameter (#9717 ) Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>	2024-10-26 16:29:38 +00:00
youkaichao	4fdc581f9e	[core] simplify seq group code (#9569 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-10-24 00:16:44 -07:00
Vinay R Damodaran	33bab41060	[Bugfix]: Make chat content text allow type content (#9358 ) Signed-off-by: Vinay Damodaran <vrdn@hey.com>	2024-10-24 05:05:49 +00:00
Tyler Michael Smith	e5ac6a4199	[Bugfix] Fix divide by zero when serving Mamba models (#9617 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-10-23 16:40:43 +00:00
yulei	b17046e298	[BugFix] Fix metrics error for --num-scheduler-steps > 1 (#8234 )	2024-10-22 15:43:03 -07:00
Ronen Schaffer	cd5601ac37	[BugFix] Prevent exporting duplicate OpenTelemetry spans (#9017 )	2024-10-22 11:11:53 -07:00
Woosuk Kwon	6c5af09b39	[V1] Implement vLLM V1 [1/N] (#9289 )	2024-10-22 01:24:07 -07:00
Travis Johnson	b729901139	[Bugfix]: serialize config by value for --trust-remote-code (#6751 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-10-21 19:46:24 -07:00
youkaichao	76a5e13270	[core] move parallel sampling out from vllm core (#9302 )	2024-10-22 00:31:44 +00:00
Wallas Henrique	711f3a7806	[Frontend] Don't log duplicate error stacktrace for every request in the batch (#9023 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-21 14:49:41 -07:00
Joe Runde	82c25151ec	[Doc] update gpu-memory-utilization flag docs (#9507 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-10-19 11:26:36 +08:00
Kunjan	9bb10a7d27	[MISC] Add lora requests to metrics (#9477 ) Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a.c.kunjanp-gke-dev-2.internal>	2024-10-18 20:50:18 +00:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Nick Hill	1ffc8a7362	[BugFix] Typing fixes to RequestOutput.prompt and beam search (#9473 )	2024-10-18 07:19:53 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Russell Bryant	776dbd74f1	[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-16 22:55:59 +00:00
Cyrus Leung	cee711fdbb	[Core] Rename input data types (#8688 )	2024-10-16 10:49:37 +00:00
Cyrus Leung	7e7eae338d	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
Brendan Wong	4d31cd424b	[Frontend] merge beam search implementations (#9296 )	2024-10-14 15:05:52 -07:00
Wallas Henrique	8baf85e4e9	[Doc] Compatibility matrix for mutual exclusive features (#8512 ) Signed-off-by: Wallas Santos <wallashss@ibm.com>	2024-10-11 11:18:50 -07:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
Sebastian Schoennenbeck	df3dcdf49d	[Bugfix] Fix priority in multiprocessing engine (#9277 )	2024-10-11 15:35:35 +00:00
Cyrus Leung	e808156f30	[Misc] Collect model support info in a single process per model (#9233 )	2024-10-11 11:08:11 +00:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
Russell Bryant	cf25b93bdd	[Core] Fix invalid args to _process_request (#9201 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-10 12:10:09 +08:00
Alex Brooks	a3691b6b5e	[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2024-10-08 14:12:56 +00:00
Brendan Wong	8c746226c9	[Frontend] API support for beam search for MQLLMEngine (#9117 )	2024-10-08 05:51:43 +00:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
sroy745	c8f26bb636	[BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (#9103 )	2024-10-07 03:52:42 +00:00

1 2 3 4 5 ...

426 Commits