wassname/vllm - vllm - Gitea: Git with a cup of tea

mirror of https://github.com/wassname/vllm.git synced 2026-07-05 22:19:46 +08:00

Author	SHA1	Message	Date
Cyrus Leung	f230cc2ca6	[Bugfix] Fix broadcasting logic for `multi_modal_kwargs` (#6836 )	2024-07-31 10:38:45 +08:00
Cyrus Leung	da1f7cc12a	[mypy] Enable following imports for some directories (#6681 )	2024-07-31 10:38:03 +08:00
Cade Daniel	c32ab8be1a	[Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964 )	2024-07-31 00:53:21 +00:00
Cade Daniel	fb4f530bf5	[CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706 )	2024-07-30 16:28:49 -07:00
Cade Daniel	79319cedfa	[Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965 )	2024-07-30 16:28:05 -07:00
Simon Mo	40c27a7cbb	[Build] Temporarily Disable Kernels and LoRA tests (#6961 )	2024-07-30 14:59:48 -07:00
youkaichao	6ca8031e71	[core][misc] improve free_finished_seq_groups (#6865 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-07-30 14:32:12 -07:00
Tyler Michael Smith	d7a299edaa	[Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842 )	2024-07-30 16:37:01 -04:00
Sanger Steel	052b6f8ca4	[Bugfix] Fix tensorizer memory profiling bug during testing (#6881 )	2024-07-30 11:48:50 -07:00
Ilya Lavrenov	5895b24677	[OpenVINO] Updated OpenVINO requirements and build docs (#6948 )	2024-07-30 11:33:01 -07:00
Tyler Michael Smith	cbbc904470	[Kernel] Squash a few more warnings (#6914 )	2024-07-30 13:50:42 -04:00
Nick Hill	5cf9254a9c	[BugFix] Fix use of per-request seed with pipeline parallel (#6698 )	2024-07-30 10:40:08 -07:00
fzyzcjy	f058403683	[Doc] Super tiny fix doc typo (#6949 )	2024-07-30 09:14:03 -07:00
Roger Wang	c66c7f86ac	[Bugfix] Fix PaliGemma MMP (#6930 )	2024-07-30 02:20:57 -07:00
Woosuk Kwon	6e063ea35b	[TPU] Fix greedy decoding (#6933 )	2024-07-30 02:06:29 -07:00
Varun Sundar Rabindranath	af647fb8b3	[Kernel] Tuned int8 kernels for Ada Lovelace (#6848 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-07-29 20:24:58 -06:00
Tyler Michael Smith	61a97c32f6	[Kernel] Fix marlin divide-by-zero warnings (#6904 )	2024-07-30 01:26:07 +00:00
Kevin H. Luu	4fbf4aa128	[ci] GHA workflow to remove ready label upon "/notready" comment (#6921 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-07-29 17:03:45 -07:00
Tyler Michael Smith	aae6d36f7e	[Kernel] Remove unused variables in awq/gemm_kernels.cu (#6908 )	2024-07-29 18:01:17 -06:00
Nick Hill	9f69d8245a	[Frontend] New `allowed_token_ids` decoding request parameter (#6753 )	2024-07-29 23:37:27 +00:00
Thomas Parnell	9a7e2d0534	[Bugfix] Allow vllm to still work if triton is not installed. (#6786 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-29 14:51:27 -07:00
Earthwalker	7f8d612d24	[TPU] Support tensor parallelism in async llm engine (#6891 )	2024-07-29 12:42:21 -07:00
Tyler Michael Smith	60d1c6e584	[Kernel] Fix deprecation function warnings squeezellm quant_cuda_kernel (#6901 )	2024-07-29 09:59:02 -07:00
Peng Guanwen	db9e5708a9	[Core] Reduce unnecessary compute when logprobs=None (#6532 )	2024-07-29 16:47:31 +00:00
Varun Sundar Rabindranath	766435e660	[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-07-29 09:42:35 -06:00
Isotr0py	7cbd9ec7a9	[Model] Initialize support for InternVL2 series models (#6514 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-29 10:16:30 +00:00
Elsa Granger	3eeb148f46	[Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871 )	2024-07-28 11:13:49 -04:00
Michael Goin	b1366a9534	Add Nemotron to PP_SUPPORTED_MODELS (#6863 )	2024-07-27 15:05:17 -07:00
Alexander Matveev	75acdaa4b6	[Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795 )	2024-07-27 17:52:33 -04:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Chenggang Wu	f954d0715c	[Docs] Add RunLLM chat widget (#6857 )	2024-07-27 09:24:46 -07:00
Cyrus Leung	1ad86acf17	[Model] Initial support for BLIP-2 (#5920 ) Co-authored-by: ywang96 <ywang@roblox.com>	2024-07-27 11:53:07 +00:00
Roger Wang	ecb33a28cb	[CI/Build][Doc] Update CI and Doc for VLM example changes (#6860 )	2024-07-27 09:54:14 +00:00
Wang Ran (汪然)	a57d75821c	[bugfix] make args.stream work (#6831 )	2024-07-27 09:07:02 +00:00
Roger Wang	925de97e05	[Bugfix] Fix VLM example typo (#6859 )	2024-07-27 14:24:08 +08:00
Roger Wang	aa46953a20	[Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858 ) Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2024-07-26 22:44:13 -07:00
Travis Johnson	593e79e733	[Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802 ) [Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-07-26 22:15:20 -07:00
Harry Mellor	c53041ae3b	[Doc] Add missing mock import to docs `conf.py` (#6834 )	2024-07-27 04:47:33 +00:00
Woosuk Kwon	52f07e3dec	[Hardware][TPU] Implement tensor parallelism with Ray (#5871 )	2024-07-26 20:54:27 -07:00
Joe	14dbd5a767	[Model] H2O Danube3-4b (#6451 )	2024-07-26 20:47:50 -07:00
tomeras91	ed94e4f427	[Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba (#6784 )	2024-07-26 20:45:31 -07:00
omrishiv	3c3012398e	[Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844 ) Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>	2024-07-26 20:20:16 -07:00
Woosuk Kwon	ced36cd89b	[ROCm] Upgrade PyTorch nightly version (#6845 )	2024-07-26 20:16:13 -07:00
Sanger Steel	969d032265	[Bugfix]: Fix Tensorizer test failures (#6835 )	2024-07-26 20:02:25 -07:00
Lucas Wilkinson	55712941e5	[Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852 )	2024-07-27 02:27:44 +00:00
Cyrus Leung	981b0d5673	[Frontend] Factor out code for running uvicorn (#6828 )	2024-07-27 09:58:25 +08:00
Woosuk Kwon	d09b94ca58	[TPU] Support collective communications in XLA devices (#6813 )	2024-07-27 01:45:57 +00:00
chenqianfzh	bb5494676f	enforce eager mode with bnb quantization temporarily (#6846 )	2024-07-27 01:32:20 +00:00
Gurpreet Singh Dhami	b5f49ee55b	Update README.md (#6847 )	2024-07-27 00:26:45 +00:00
Zhanghao Wu	150a1ffbfd	[Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283 )	2024-07-26 14:39:10 -07:00

1 2 3 4 5 ...

2132 Commits