mirror of
https://github.com/wassname/vllm.git
synced 2026-06-27 17:32:55 +08:00
[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971)
Co-authored-by: Jianan Gu <jianan.gu@intel.com>
This commit is contained in:
@@ -65,7 +65,7 @@ vLLM is flexible and easy to use with:
|
||||
- Tensor parallelism support for distributed inference
|
||||
- Streaming outputs
|
||||
- OpenAI-compatible API server
|
||||
- Support NVIDIA GPUs and AMD GPUs
|
||||
- Support NVIDIA GPUs, AMD GPUs, and Intel CPUs
|
||||
- (Experimental) Prefix caching support
|
||||
- (Experimental) Multi-lora support
|
||||
|
||||
|
||||
Reference in New Issue
Block a user