62 Commits

Author SHA1 Message Date
Harry Mellor cf069aa8aa Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Daniele 81dabf24a8 [CI/Build] force writing version file (#13544)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-02-19 18:48:03 +08:00
Daniele a02c86b4dd [CI/Build] migrate static project metadata from setup.py to pyproject.toml (#8772) 2025-02-18 08:02:49 -08:00
Tyler Michael Smith 067678262a [Bugfix][CI] Inherit codespell settings from pyproject.toml in the pre-commit-config (#13237) 2025-02-13 20:19:43 -08:00
Russell Bryant 8eafe5eaea [CI/Build] Ignore ruff warning up007 (#13182)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-13 11:48:31 +08:00
Harry Mellor dd6a3a02cb [Doc] Convert docs to use colon fences (#12471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-29 11:38:29 +08:00
Harry Mellor 3ea7b94523 Move linting to pre-commit (#11975)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
Rafael Vasquez 43f3d9e699 [CI/Build] Add markdown linter (#11857)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2025-01-12 00:17:13 -08:00
Harry Mellor aba8d6ee00 [Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-08 13:09:53 +00:00
Dipika Sikka 60508ffda9 [Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
Cyrus Leung d1e21a979b [CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-12 06:18:16 +08:00
Wallas Henrique c27df94e1f [Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-11-25 12:23:32 -05:00
youkaichao 377b74fe87 Revert "[ci][build] limit cmake version" (#10271) 2024-11-12 15:06:48 -08:00
youkaichao 9fa4bdde9d [ci][build] limit cmake version (#10188)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-11-09 16:27:26 -08:00
Cyrus Leung 51c2e1fcef [CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-09 11:39:14 -08:00
Florian Zimmermeister e1b5a82179 Rename vllm.logging to vllm.logging_utils (#10134) 2024-11-08 20:53:24 +00:00
Cyrus Leung b489fc3c91 [CI/Build] Update CPU tests to include all "standard" tests (#5481)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-11-08 23:30:04 +08:00
Russell Bryant e036e527a0 [CI/Build] Improve mypy + python version matrix (#10041)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-11-07 07:54:16 +00:00
Joe Runde d58268c56a [V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-11-06 11:57:35 -08:00
Aaron Pham 21063c11c7 [CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
bnellnm d93478b399 [Bugfix] Upgrade to pytorch 2.5.1 (#10001)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-04 15:11:28 -08:00
bnellnm 3cb07a36a2 [Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
Daniele 203ab8f80f [CI/Build] setuptools-scm fixes (#8900) 2024-10-14 11:34:47 -07:00
Daniele 2467b642dd [CI/Build] fix setuptools-scm usage (#8771) 2024-09-24 12:38:12 -07:00
Daniele ee5f34b1c2 [CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Aaron Pham 9d104b5beb [CI/Build] Update Ruff version (#8469)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-18 11:00:56 +00:00
Cyrus Leung a84e598e21 [CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Cyrus Leung 8427550488 [CI/Build] Update pixtral tests to use JSON (#8436) 2024-09-13 03:47:52 +00:00
Jonathan Berkhahn 9c71c97ae2 [mypy] Enable mypy type checking for vllm/core (#7229) 2024-08-28 07:11:14 +08:00
sasha0552 dd3fa0e430 [Bugfix] Mirror jinja2 in pyproject.toml (#7723) 2024-08-21 13:41:17 +00:00
Cyrus Leung baaedfdb2d [mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Daniele 774cd1d3bf [CI/Build] bump minimum cmake version (#6999) 2024-08-12 16:29:20 -07:00
Sage Moore 7e0861bd0b [CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
Simon Mo 7eb0cb4a14 Revert "[Frontend] Factor out code for running uvicorn" (#7012)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-07-31 16:34:26 -07:00
Cyrus Leung da1f7cc12a [mypy] Enable following imports for some directories (#6681) 2024-07-31 10:38:03 +08:00
Cody Yu aa48e502fb [MISC] Upgrade dependency to PyTorch 2.3.1 (#5327) 2024-07-12 12:04:26 -07:00
Cyrus Leung 9d47f64eb6 [CI/Build] [3/3] Reorganize entrypoints tests (#5966) 2024-06-30 12:58:49 +08:00
Roger Wang 4ad7b53e59 [CI/Build][Misc] Update Pytest Marker for VLMs (#5623) 2024-06-18 13:10:04 +00:00
Cyrus Leung 89c920785f [CI/Build] Update vision tests (#5307) 2024-06-06 05:17:18 -05:00
Tyler Michael Smith 260d119e86 [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137) 2024-06-01 06:45:32 +00:00
Cyrus Leung 5ae5ed1e60 [Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Michael Goin 757b62c495 [CI/Build] Codespell ignore build/ directory (#4945) 2024-05-21 09:06:10 -07:00
SangBin Cho 2e9a2227ec [Lora] Support long context lora (#4787)
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
Michael Goin d627a3d837 [Misc] Upgrade to torch==2.3.0 (#4454) 2024-04-29 20:05:47 -04:00
SangBin Cho a88081bf76 [CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
SangBin Cho 0ae11f78ab [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) 2024-04-22 21:32:44 -07:00
SangBin Cho 09473ee41c [mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
youkaichao ca81ff5196 [Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805) 2024-04-04 10:26:19 -07:00
Adrian Abeyta 2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang 45b6ef6513 feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277) 2024-03-27 13:39:26 -07:00