Commit Graph

43 Commits

Author SHA1 Message Date
Harry Mellor cf069aa8aa Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
Gregory Shtrasberg aabeb2688f [ROCm][Quantization][Kernel] Using HIP FP8 header (#12593) 2025-02-25 00:39:59 -08:00
Lucas Wilkinson 288cc6c234 [Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
Lucas Wilkinson 75e94309e8 [Perf] Mem align KV caches for CUDA devices (MLA perf improvement) (#12676)
Signed-off-by: simon-mo <xmo@berkeley.edu>
Signed-off-by: Lucas Wilkinson <lcwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
2025-02-04 18:22:24 -08:00
Russell Bryant e489ad7a21 [Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Gregory Shtrasberg e97f802b2d [FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
Michael Goin 235366fe2e [CI] Prune back the number of tests in tests/kernels/* (#9932)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-05 16:02:32 -05:00
Pavani Majety 598b6d7b07 [Bugfix/Core] Flashinfer k_scale and v_scale (#9861) 2024-11-01 12:15:05 -07:00
wangshuai09 622b7ab955 [Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
Cyrus Leung 6ffa3f314c [CI/Build] Avoid CUDA initialization (#8534) 2024-09-18 10:38:11 +00:00
bnellnm 73202dbe77 [Kernel][Misc] register ops to prevent graph breaks (#6917)
Co-authored-by: Sage Moore <sage@neuralmagic.com>
2024-09-11 12:52:19 -07:00
jon-chuang 50b8d08dbd [Misc/Testing] Use torch.testing.assert_close (#7324) 2024-08-16 04:24:04 +00:00
Joe 14dbd5a767 [Model] H2O Danube3-4b (#6451) 2024-07-26 20:47:50 -07:00
Antoni Baum 0e63494cf3 Add fp8 support to reshape_and_cache_flash (#6667) 2024-07-24 18:36:52 +00:00
Michael Goin 978aed5300 [Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081) 2024-07-16 15:31:32 -07:00
Cyrus Leung 0e9164b40a [mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
SnowDist a22dea54d3 [Model] Support MAP-NEO model (#5081)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-05-30 19:24:41 -07:00
Cody Yu c833101740 [Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support (#4535) 2024-05-09 18:04:17 -06:00
youkaichao 230c4b38c1 [CI/Test] fix swap test for multi gpu (#4689) 2024-05-08 13:14:02 -07:00
youkaichao 20cfcdec99 [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) 2024-05-08 12:07:05 -07:00
youkaichao 63575bc2e1 [Core][Optimization] change python dict to pytorch tensor (#4607) 2024-05-06 21:30:27 -07:00
Lily Liu 43c413ec57 [Kernel] Use flashinfer for decoding (#4353)
Co-authored-by: LiuXiaoxuanPKU <llilyliupku@gmail.com>
2024-05-03 15:51:27 -07:00
Kunshang Ji e9da5a40c6 [Misc] Add indirection layer for custom ops (#3913) 2024-04-10 20:26:07 -07:00
Adrian Abeyta 2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290)
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: HaiShaw <hixiao@gmail.com>
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu>
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com>
Co-authored-by: guofangze <guofangze@kuaishou.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-04-03 14:15:55 -07:00
SangBin Cho 01bfb22b41 [CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
youkaichao 8b268a46a7 [CI] typo fix: is_hip --> is_hip() (#3595) 2024-03-24 16:03:06 -07:00
Lily Liu fe6d09ae61 [Minor] More fix of test_cache.py CI test failure (#2750) 2024-02-06 11:38:38 -08:00
Hongxia Yang 56f738ae9b [ROCm] Fix some kernels failed unit tests (#2498) 2024-02-05 14:25:36 -08:00
Kunshang Ji 96b6f475dd Remove hardcoded device="cuda" to support more devices (#2503)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2024-02-01 15:46:39 -08:00
Philipp Moritz 89efcf1ce5 [Minor] Fix test_cache.py CI test failure (#2684) 2024-01-31 10:12:11 -08:00
Vladimir 4f65af0e25 Add swap_blocks unit tests (#2616) 2024-01-30 09:30:50 -08:00
zhaoyang-star 9090bf02e7 Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Simon Mo 6e01e8c1c8 [CI] Add Buildkite (#2355) 2024-01-14 12:37:58 -08:00
Woosuk Kwon 941767127c Revert the changes in test_cache (#2335) 2024-01-03 17:32:05 -08:00
Zhuohan Li fd4ea8ef5c Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Jee Li 77af974b40 [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
Yanming W e0c6f556e8 [Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00
Woosuk Kwon 0ce8647dc5 Fix integer overflows in attention & cache ops (#1514) 2023-10-31 15:19:30 -07:00
Zhuohan Li ba0bfd40e2 TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Woosuk Kwon fbd80ad409 Clean up kernel unit tests (#938) 2023-09-05 16:57:38 -07:00
Zhuohan Li d6fa1be3a8 [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Woosuk Kwon 0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Woosuk Kwon 825d8892b5 Use pytest format for unit tests (#107) 2023-05-17 17:11:23 -07:00