mirror of
https://github.com/wassname/vllm.git
synced 2026-06-29 20:18:34 +08:00
8ceffbf315
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
616 B
616 B
(serving-llamaindex)=
LlamaIndex
vLLM is also available via LlamaIndex .
To install LlamaIndex, run
$ pip install llama-index-llms-vllm -q
To run inference on a single or multiple GPUs, use Vllm class from llamaindex.
from llama_index.llms.vllm import Vllm
llm = Vllm(
model="microsoft/Orca-2-7b",
tensor_parallel_size=4,
max_new_tokens=100,
vllm_kwargs={"swap_space": 1, "gpu_memory_utilization": 0.5},
)
Please refer to this Tutorial for more details.