vllm/tests/weight_loading at b5cbe8eeb30e86c8477d91c66f5c7a10e4ee754b - vllm

mirror of https://github.com/wassname/vllm.git synced 2026-07-02 05:54:09 +08:00

Files

T

Dipika Sikka 60508ffda9 [Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 )

Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>

2024-12-18 09:57:16 -05:00

models-large.txt

[Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213 )

2024-10-10 14:15:40 +08:00

models.txt

[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 )

2024-12-18 09:57:16 -05:00

run_model_weight_loading_test.sh

[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 )

2024-12-18 09:57:16 -05:00

test_weight_loading.py

[Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995 )

2024-12-18 09:57:16 -05:00