mirror of
https://github.com/wassname/vllm.git
synced 2026-07-02 05:54:09 +08:00
8ceffbf315
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
6 lines
444 B
Markdown
6 lines
444 B
Markdown
(deployment-triton)=
|
|
|
|
# NVIDIA Triton
|
|
|
|
The [Triton Inference Server](https://github.com/triton-inference-server) hosts a tutorial demonstrating how to quickly deploy a simple [facebook/opt-125m](https://huggingface.co/facebook/opt-125m) model using vLLM. Please see [Deploying a vLLM model in Triton](https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton) for more details.
|