Commit Graph

2132 Commits

Author SHA1 Message Date
Cyrus Leung f230cc2ca6 [Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836) 2024-07-31 10:38:45 +08:00
Cyrus Leung da1f7cc12a [mypy] Enable following imports for some directories (#6681) 2024-07-31 10:38:03 +08:00
Cade Daniel c32ab8be1a [Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964) 2024-07-31 00:53:21 +00:00
Cade Daniel fb4f530bf5 [CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706) 2024-07-30 16:28:49 -07:00
Cade Daniel 79319cedfa [Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965) 2024-07-30 16:28:05 -07:00
Simon Mo 40c27a7cbb [Build] Temporarily Disable Kernels and LoRA tests (#6961) 2024-07-30 14:59:48 -07:00
youkaichao 6ca8031e71 [core][misc] improve free_finished_seq_groups (#6865)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-07-30 14:32:12 -07:00
Tyler Michael Smith d7a299edaa [Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842) 2024-07-30 16:37:01 -04:00
Sanger Steel 052b6f8ca4 [Bugfix] Fix tensorizer memory profiling bug during testing (#6881) 2024-07-30 11:48:50 -07:00
Ilya Lavrenov 5895b24677 [OpenVINO] Updated OpenVINO requirements and build docs (#6948) 2024-07-30 11:33:01 -07:00
Tyler Michael Smith cbbc904470 [Kernel] Squash a few more warnings (#6914) 2024-07-30 13:50:42 -04:00
Nick Hill 5cf9254a9c [BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
fzyzcjy f058403683 [Doc] Super tiny fix doc typo (#6949) 2024-07-30 09:14:03 -07:00
Roger Wang c66c7f86ac [Bugfix] Fix PaliGemma MMP (#6930) 2024-07-30 02:20:57 -07:00
Woosuk Kwon 6e063ea35b [TPU] Fix greedy decoding (#6933) 2024-07-30 02:06:29 -07:00
Varun Sundar Rabindranath af647fb8b3 [Kernel] Tuned int8 kernels for Ada Lovelace (#6848)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-29 20:24:58 -06:00
Tyler Michael Smith 61a97c32f6 [Kernel] Fix marlin divide-by-zero warnings (#6904) 2024-07-30 01:26:07 +00:00
Kevin H. Luu 4fbf4aa128 [ci] GHA workflow to remove ready label upon "/notready" comment (#6921)
Signed-off-by: kevin <kevin@anyscale.com>
2024-07-29 17:03:45 -07:00
Tyler Michael Smith aae6d36f7e [Kernel] Remove unused variables in awq/gemm_kernels.cu (#6908) 2024-07-29 18:01:17 -06:00
Nick Hill 9f69d8245a [Frontend] New allowed_token_ids decoding request parameter (#6753) 2024-07-29 23:37:27 +00:00
Thomas Parnell 9a7e2d0534 [Bugfix] Allow vllm to still work if triton is not installed. (#6786)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-07-29 14:51:27 -07:00
Earthwalker 7f8d612d24 [TPU] Support tensor parallelism in async llm engine (#6891) 2024-07-29 12:42:21 -07:00
Tyler Michael Smith 60d1c6e584 [Kernel] Fix deprecation function warnings squeezellm quant_cuda_kernel (#6901) 2024-07-29 09:59:02 -07:00
Peng Guanwen db9e5708a9 [Core] Reduce unnecessary compute when logprobs=None (#6532) 2024-07-29 16:47:31 +00:00
Varun Sundar Rabindranath 766435e660 [Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-29 09:42:35 -06:00
Isotr0py 7cbd9ec7a9 [Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
Elsa Granger 3eeb148f46 [Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871) 2024-07-28 11:13:49 -04:00
Michael Goin b1366a9534 Add Nemotron to PP_SUPPORTED_MODELS (#6863) 2024-07-27 15:05:17 -07:00
Alexander Matveev 75acdaa4b6 [Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795) 2024-07-27 17:52:33 -04:00
Woosuk Kwon fad5576c58 [TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
Chenggang Wu f954d0715c [Docs] Add RunLLM chat widget (#6857) 2024-07-27 09:24:46 -07:00
Cyrus Leung 1ad86acf17 [Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
Roger Wang ecb33a28cb [CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
Wang Ran (汪然) a57d75821c [bugfix] make args.stream work (#6831) 2024-07-27 09:07:02 +00:00
Roger Wang 925de97e05 [Bugfix] Fix VLM example typo (#6859) 2024-07-27 14:24:08 +08:00
Roger Wang aa46953a20 [Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-07-26 22:44:13 -07:00
Travis Johnson 593e79e733 [Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802)
[Bugfix] Use torch.set_num_threads() to configure parallelism in multiproc_gpu_executor (#6802)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-07-26 22:15:20 -07:00
Harry Mellor c53041ae3b [Doc] Add missing mock import to docs conf.py (#6834) 2024-07-27 04:47:33 +00:00
Woosuk Kwon 52f07e3dec [Hardware][TPU] Implement tensor parallelism with Ray (#5871) 2024-07-26 20:54:27 -07:00
Joe 14dbd5a767 [Model] H2O Danube3-4b (#6451) 2024-07-26 20:47:50 -07:00
tomeras91 ed94e4f427 [Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba (#6784) 2024-07-26 20:45:31 -07:00
omrishiv 3c3012398e [Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
Woosuk Kwon ced36cd89b [ROCm] Upgrade PyTorch nightly version (#6845) 2024-07-26 20:16:13 -07:00
Sanger Steel 969d032265 [Bugfix]: Fix Tensorizer test failures (#6835) 2024-07-26 20:02:25 -07:00
Lucas Wilkinson 55712941e5 [Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852) 2024-07-27 02:27:44 +00:00
Cyrus Leung 981b0d5673 [Frontend] Factor out code for running uvicorn (#6828) 2024-07-27 09:58:25 +08:00
Woosuk Kwon d09b94ca58 [TPU] Support collective communications in XLA devices (#6813) 2024-07-27 01:45:57 +00:00
chenqianfzh bb5494676f enforce eager mode with bnb quantization temporarily (#6846) 2024-07-27 01:32:20 +00:00
Gurpreet Singh Dhami b5f49ee55b Update README.md (#6847) 2024-07-27 00:26:45 +00:00
Zhanghao Wu 150a1ffbfd [Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) 2024-07-26 14:39:10 -07:00