Commit Graph

43 Commits

Author SHA1 Message Date
Aaron Pham 21063c11c7 [CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2024-11-06 07:11:55 +00:00
bnellnm d93478b399 [Bugfix] Upgrade to pytorch 2.5.1 (#10001)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
2024-11-04 15:11:28 -08:00
bnellnm 3cb07a36a2 [Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
Daniele 203ab8f80f [CI/Build] setuptools-scm fixes (#8900) 2024-10-14 11:34:47 -07:00
Daniele 2467b642dd [CI/Build] fix setuptools-scm usage (#8771) 2024-09-24 12:38:12 -07:00
Daniele ee5f34b1c2 [CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
Aaron Pham 9d104b5beb [CI/Build] Update Ruff version (#8469)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-18 11:00:56 +00:00
Cyrus Leung a84e598e21 [CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Cyrus Leung 8427550488 [CI/Build] Update pixtral tests to use JSON (#8436) 2024-09-13 03:47:52 +00:00
Jonathan Berkhahn 9c71c97ae2 [mypy] Enable mypy type checking for vllm/core (#7229) 2024-08-28 07:11:14 +08:00
sasha0552 dd3fa0e430 [Bugfix] Mirror jinja2 in pyproject.toml (#7723) 2024-08-21 13:41:17 +00:00
Cyrus Leung baaedfdb2d [mypy] Enable following imports for entrypoints (#7248)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
Daniele 774cd1d3bf [CI/Build] bump minimum cmake version (#6999) 2024-08-12 16:29:20 -07:00
Sage Moore 7e0861bd0b [CI/Build] Update PyTorch to 2.4.0 (#6951)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-01 11:11:24 -07:00
Simon Mo 7eb0cb4a14 Revert "[Frontend] Factor out code for running uvicorn" (#7012)
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-07-31 16:34:26 -07:00
Cyrus Leung da1f7cc12a [mypy] Enable following imports for some directories (#6681) 2024-07-31 10:38:03 +08:00
Cody Yu aa48e502fb [MISC] Upgrade dependency to PyTorch 2.3.1 (#5327) 2024-07-12 12:04:26 -07:00
Cyrus Leung 9d47f64eb6 [CI/Build] [3/3] Reorganize entrypoints tests (#5966) 2024-06-30 12:58:49 +08:00
Roger Wang 4ad7b53e59 [CI/Build][Misc] Update Pytest Marker for VLMs (#5623) 2024-06-18 13:10:04 +00:00
Cyrus Leung 89c920785f [CI/Build] Update vision tests (#5307) 2024-06-06 05:17:18 -05:00
Tyler Michael Smith 260d119e86 [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137) 2024-06-01 06:45:32 +00:00
Cyrus Leung 5ae5ed1e60 [Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Michael Goin 757b62c495 [CI/Build] Codespell ignore build/ directory (#4945) 2024-05-21 09:06:10 -07:00
SangBin Cho 2e9a2227ec [Lora] Support long context lora (#4787)
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
Michael Goin d627a3d837 [Misc] Upgrade to torch==2.3.0 (#4454) 2024-04-29 20:05:47 -04:00
SangBin Cho a88081bf76 [CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
SangBin Cho 0ae11f78ab [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) 2024-04-22 21:32:44 -07:00
SangBin Cho 09473ee41c [mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
youkaichao ca81ff5196 [Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805) 2024-04-04 10:26:19 -07:00
Adrian Abeyta 2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
Roger Wang 45b6ef6513 feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277) 2024-03-27 13:39:26 -07:00
SangBin Cho 01bfb22b41 [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
bnellnm 9fdf3de346 Cmake based build system (#2830) 2024-03-18 15:38:33 -07:00
Ronen Schaffer 14e3f9a1b2 Replace lstrip() with removeprefix() to fix Ruff linter warning (#2958) 2024-03-15 21:01:30 -07:00
Zhuohan Li 2f8844ba08 Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Massimiliano Pronesti 93dc5a2870 chore(vllm): codespell for spell checking (#2820) 2024-02-21 18:56:01 -08:00
Woosuk Kwon b0a1d667b0 Pin PyTorch & xformers versions (#2155) 2023-12-17 01:46:54 -08:00
Woosuk Kwon f3e024bece [CI/CD] Upgrade PyTorch version to v2.1.1 (#2045) 2023-12-11 17:48:11 -08:00
Allen f07c1ceaa5 [FIX] Fix docker build error (#1831) (#1832)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-11-29 23:06:50 -08:00
Simon Mo 5ffc0d13a2 Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
Zhuohan Li 06458a0b42 Upgrade to CUDA 12 (#1527)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-11-08 14:17:49 -08:00
yanxiyue 6a6119554c lock torch version to 2.0.1 (#1290) 2023-10-10 09:21:57 -07:00
Woosuk Kwon 376725ce74 [PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00