[Hardware][Intel] Optimize CPU backend and add more performance tips (#4971)

Co-authored-by: Jianan Gu <jianan.gu@intel.com>
This commit is contained in:
Li, Jiang
2024-06-14 00:33:14 +08:00
committed by GitHub
parent bd43973522
commit 80aa7e91fc
6 changed files with 165 additions and 13 deletions
+1 -1
View File
@@ -65,7 +65,7 @@ vLLM is flexible and easy to use with:
- Tensor parallelism support for distributed inference
- Streaming outputs
- OpenAI-compatible API server
- Support NVIDIA GPUs and AMD GPUs
- Support NVIDIA GPUs, AMD GPUs, and Intel CPUs
- (Experimental) Prefix caching support
- (Experimental) Multi-lora support