mirror of
https://github.com/wassname/vllm.git
synced 2026-06-28 17:36:13 +08:00
8ceffbf315
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
469 B
469 B
(deployment-bentoml)=
BentoML
BentoML allows you to deploy a large language model (LLM) server with vLLM as the backend, which exposes OpenAI-compatible endpoints. You can serve the model locally or containerize it as an OCI-complicant image and deploy it on Kubernetes.
For details, see the tutorial vLLM inference in the BentoML documentation.