vllm/csrc/quantization/fp8 at a88bb9b032d75aad74b2e1bd3d97b8e8a24e8b9d - vllm

mirror of https://github.com/wassname/vllm.git synced 2026-07-02 15:24:29 +08:00

Files

T

Philipp Moritz 12628d3c78 [Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343 )

Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>

2024-04-27 04:49:59 +00:00

2024-04-03 14:15:55 -07:00

fp8_cuda_kernels.cu

2024-04-27 04:49:59 +00:00