wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-06-27 18:45:36 +08:00

Author	SHA1	Message	Date
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Nick Hill	5a2ba16f5c	[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms (#13688 )	2025-02-23 02:54:29 -08:00
Isotr0py	ba5106e519	[LMM] Implement merged multimodal processor for whisper (#13278 )	2025-02-23 01:46:03 -08:00
Kyle Sayers	d5ca2110f1	[Quant] BaiChuan SupportsQuant (#13710 )	2025-02-22 19:21:15 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Andy Lo	322d2a27d6	[BugFix] Minor: logger import in attention backend (#13706 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-02-22 16:51:13 -08:00
Roger Wang	82e0d601fc	[CI/Build] Fix pre-commit errors from #13571 (#13709 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-22 16:50:38 -08:00
Daniele	78ac0f591d	[CI/Build] fix uv caching in Dockerfile (#13611 )	2025-02-22 08:25:20 -08:00
Yan Ma	b56155e7f3	[XPU]fix setuptools version for xpu (#13548 )	2025-02-22 08:05:35 -08:00
Helena Kloosterman	382f66fb08	[Bugfix] Fix boolean conversion for OpenVINO env variable (#13615 )	2025-02-22 08:04:12 -08:00
Cyrus Leung	8354f6640c	[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699 )	2025-02-22 06:04:31 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00
Sage Moore	558db8083c	[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths (#13095 )	2025-02-22 05:25:41 -08:00
Kaixi Hou	e109e598c7	[NVIDIA] Support nvfp4 cutlass gemm (#13571 )	2025-02-22 05:24:05 -08:00
Keyun Tong	8db1b9d0a1	Support SSL Key Rotation in HTTP Server (#13495 )	2025-02-22 05:17:44 -08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Yu Chin Fabian Lim	fca20841c2	Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660 )	2025-02-22 00:19:10 -08:00
Jennifer Zhao	da31b5333e	[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-02-22 00:08:29 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Dipika Sikka	95c617e04b	[Misc] Bump compressed-tensors (#13619 )	2025-02-21 22:09:04 -08:00
Shane A	9a1f1da5d1	[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687 )	2025-02-21 22:07:45 -08:00
Gordon Wong	68d630a0c7	[ROCM] fix native attention function call (#13650 )	2025-02-21 22:07:04 -08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
Robin	c6ed93860f	[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… (#13672 )	2025-02-21 22:05:28 -08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Yuan Tang	8c0dd3d4df	docs: Add a note on full CI run in contributing guide (#13646 )	2025-02-21 21:53:59 -08:00
Isotr0py	ada7c780d5	[Misc] Fix yapf linting tools etc not running on pre-commit (#13695 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-22 13:10:43 +08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
John Zheng	900edbfa48	fix typo of grafana dashboard, with correct datasource (#13668 ) Signed-off-by: John Zheng <john.zheng@hp.com>	2025-02-21 18:21:05 +00:00
Isotr0py	b2c3fc5d65	[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation (#13586 )	2025-02-20 22:24:17 -08:00
leoneo	839b27c6cc	[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978 )	2025-02-20 22:14:24 -08:00
Kevin H. Luu	34ad27fe83	[ci] Fix metrics test model path (#13635 )	2025-02-20 22:12:10 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Szymon Ożóg	1cdc88614a	Missing comment explaining VDR variable in GGUF kernels (#13290 )	2025-02-20 22:06:54 -08:00
Nick Hill	31aa045c11	[V1][Sampler] Avoid an operation during temperature application (#13587 )	2025-02-20 22:05:56 -08:00
Roger Wang	a30c093502	[Bugfix] Add `mm_processor_kwargs` to chat-related protocols (#13644 )	2025-02-20 22:04:33 -08:00
Harry Mellor	c7b07a95a6	Use pre-commit to update `requirements-test.txt` (#13617 )	2025-02-20 22:03:27 -08:00
Kaixi Hou	27a09dc52c	[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632 )	2025-02-20 22:01:48 -08:00
Edwin Hernandez	981f3c831e	[Misc] Adding script to setup ray for multi-node vllm deployments (#12913 )	2025-02-20 21:16:40 -08:00
Kante Yin	44c33f01f3	Add llmaz as another integration (#13643 ) Signed-off-by: kerthcet <kerthcet@gmail.com>	2025-02-21 03:52:40 +00:00
Lingfan Yu	33170081f1	[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth (#13245 ) Signed-off-by: Lingfan Yu <lingfany@amazon.com>	2025-02-20 17:45:45 -08:00

1 2 3 4 5 ...

4763 Commits