wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-06-29 07:42:35 +08:00

Author	SHA1	Message	Date
Michael Goin	db986c19ea	Fix precommit fail in fused_moe intermediate_cache2 chunking (#13772 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-24 09:25:47 -08:00
Roger Wang	227578480d	Revert "[V1][Core] Fix memory issue with logits & sampling" (#13775 )	2025-02-24 09:16:05 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Nicolò Lucchesi	444b0f0f62	[Misc][Docs] Raise error when flashinfer is not installed and `VLLM_ATTENTION_BACKEND` is set (#12513 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-02-24 10:43:21 -05:00
Zhonghua Deng	ccc00515fd	[BugFix] Illegal memory access for MoE On H20 (#13693 )	2025-02-24 07:37:32 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Roger Meier	7940d8a6a7	[CI/Build] add python-json-logger to requirements-common (#12842 )	2025-02-24 06:10:33 -08:00
Roger Meier	c0e3ecd6d2	[Bugfix] fix(logging): add missing opening square bracket (#13011 )	2025-02-24 06:10:25 -08:00
Mengqing Cao	23eca9cf68	[model][refactor] remove cuda hard code in models and layers (#13658 )	2025-02-24 06:10:14 -08:00
Roger Wang	437b76ff59	[V1][Core] Fix memory issue with logits & sampling (#13721 )	2025-02-24 06:10:06 -08:00
Kevin H. Luu	f90a375593	[ci] Add logic to change model to S3 path only when S3 CI env var is on (#13727 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal>	2025-02-24 06:32:11 +00:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Roger Wang	9bebc9512f	[Misc] Deprecate `--dataset` from `benchmark_serving.py` (#13708 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-23 13:32:20 +00:00
Nick Hill	5a2ba16f5c	[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms (#13688 )	2025-02-23 02:54:29 -08:00
Isotr0py	ba5106e519	[LMM] Implement merged multimodal processor for whisper (#13278 )	2025-02-23 01:46:03 -08:00
Kyle Sayers	d5ca2110f1	[Quant] BaiChuan SupportsQuant (#13710 )	2025-02-22 19:21:15 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Andy Lo	322d2a27d6	[BugFix] Minor: logger import in attention backend (#13706 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-02-22 16:51:13 -08:00
Roger Wang	82e0d601fc	[CI/Build] Fix pre-commit errors from #13571 (#13709 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2025-02-22 16:50:38 -08:00
Daniele	78ac0f591d	[CI/Build] fix uv caching in Dockerfile (#13611 )	2025-02-22 08:25:20 -08:00
Yan Ma	b56155e7f3	[XPU]fix setuptools version for xpu (#13548 )	2025-02-22 08:05:35 -08:00
Helena Kloosterman	382f66fb08	[Bugfix] Fix boolean conversion for OpenVINO env variable (#13615 )	2025-02-22 08:04:12 -08:00
Cyrus Leung	8354f6640c	[Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699 )	2025-02-22 06:04:31 -08:00
Gregory Shtrasberg	c904fdddf6	[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm (#13231 )	2025-02-22 05:54:38 -08:00
Sage Moore	558db8083c	[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths (#13095 )	2025-02-22 05:25:41 -08:00
Kaixi Hou	e109e598c7	[NVIDIA] Support nvfp4 cutlass gemm (#13571 )	2025-02-22 05:24:05 -08:00
Keyun Tong	8db1b9d0a1	Support SSL Key Rotation in HTTP Server (#13495 )	2025-02-22 05:17:44 -08:00
youkaichao	2382ad29d1	[ci] fix linter (#13701 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 20:28:59 +08:00
youkaichao	3e472d882a	[core] set up data parallel communication (#13591 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-22 19:28:59 +08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Mark McLoughlin	2cb8c1540e	[Metrics] Add `--show-hidden-metrics-for-version` CLI arg (#13295 )	2025-02-22 00:20:45 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Yu Chin Fabian Lim	fca20841c2	Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size (#13660 )	2025-02-22 00:19:10 -08:00
Jennifer Zhao	da31b5333e	[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler (#13594 ) Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2025-02-22 00:08:29 -08:00
Lu Fang	bb78fb318e	[v1] Support allowed_token_ids in v1 Sampler (#13210 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-02-22 14:13:05 +08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Dipika Sikka	95c617e04b	[Misc] Bump compressed-tensors (#13619 )	2025-02-21 22:09:04 -08:00
Shane A	9a1f1da5d1	[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA (#13687 )	2025-02-21 22:07:45 -08:00
Gordon Wong	68d630a0c7	[ROCM] fix native attention function call (#13650 )	2025-02-21 22:07:04 -08:00
Jun Duan	68d535ef44	[Misc] Capture and log the time of loading weights (#13666 )	2025-02-21 22:06:34 -08:00
Robin	c6ed93860f	[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… (#13672 )	2025-02-21 22:05:28 -08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Yuan Tang	8c0dd3d4df	docs: Add a note on full CI run in contributing guide (#13646 )	2025-02-21 21:53:59 -08:00
Isotr0py	ada7c780d5	[Misc] Fix yapf linting tools etc not running on pre-commit (#13695 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-02-22 13:10:43 +08:00
Lucas Wilkinson	288cc6c234	[Attention] MLA with chunked prefill (#12639 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Patrick Horn <patrick.horn@gmail.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-21 15:30:12 -08:00
John Zheng	900edbfa48	fix typo of grafana dashboard, with correct datasource (#13668 ) Signed-off-by: John Zheng <john.zheng@hp.com>	2025-02-21 18:21:05 +00:00
Isotr0py	b2c3fc5d65	[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation (#13586 )	2025-02-20 22:24:17 -08:00

1 2 3 4 5 ...

4774 Commits