vllm/docs/source/features/quantization/index.md at cd1d3c3df845fc6baba4ab5ba4d168f3d632b92d - vllm - Gitea: Git with a cup of tea

wassname/vllm

mirror of https://github.com/wassname/vllm.git synced 2026-06-27 19:49:51 +08:00

Files

T

Qubitium-ModelCloud cd1d3c3df8 [Docs] Add GPTQModel (#14056 )

Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

2025-03-03 21:59:09 +00:00

21 lines

300 B

Markdown

Raw Blame History

 (quantization-index)=
 # Quantization
 Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
 :::{toctree}
 :caption: Contents
 :maxdepth: 1
 supported_hardware
 auto_awq
 bnb
 gguf
 gptqmodel
 int4
 int8
 fp8
 quantized_kvcache
 :::