[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971)

Co-authored-by: Jianan Gu <jianan.gu@intel.com>
2026-06-27 17:32:55 +08:00 · 2024-06-14 00:33:14 +08:00
parent bd43973522
commit 80aa7e91fc
6 changed files with 165 additions and 13 deletions
@@ -65,7 +65,7 @@ vLLM is flexible and easy to use with:
 - Tensor parallelism support for distributed inference
 - Streaming outputs
 - OpenAI-compatible API server
- Support NVIDIA GPUs and AMD GPUs
+- Support NVIDIA GPUs, AMD GPUs, and Intel CPUs
 - (Experimental) Prefix caching support
 - (Experimental) Multi-lora support