Commit Graph - vllm - Gitea: Git with a cup of tea

wassname/vllm

Fork 0

mirror of https://github.com/wassname/vllm.git synced 2026-06-27 18:27:02 +08:00

85ebcda94d Fix typo of Aquila in README.md (#836) ldwang 2023-08-23 11:48:36 +08:00
d64bf1646c Implement approximate GELU kernels (#828) Woosuk Kwon 2023-08-23 07:43:21 +09:00
a41c20435e Add compute capability 8.9 to default targets (#829) Woosuk Kwon 2023-08-23 07:28:38 +09:00
eedac9dba0 fix: revert code to avoid no attribute problem (#827) Wen Sun 2023-08-23 02:55:16 +08:00
14f9c72bfd Update Supported Model List (#825) Zhuohan Li 2023-08-22 11:51:44 -07:00
ad5f2fe34c Add support for aquila (#663) shunxing1234 2023-08-22 15:13:36 +08:00
4f8584756d Fix mqa is false case in gpt_bigcode (#806) zhaoyang-star 2023-08-22 13:22:06 +08:00
65fc1c3127 set default coompute capability according to cuda version (#773) Xudong Zhang 2023-08-22 07:05:44 +08:00
c393af6cd7 [Feature | CI] Added a github action to build wheels (#746) Daniel 2023-08-21 10:59:15 +03:00
0c04ce3234 Fix typo in sampling_params.py (#788) wangcx18 2023-08-18 09:12:46 +08:00
73b3de79ea explicitly del state (#784) Xinyu Yang 2023-08-18 03:56:04 +08:00
d1744376ae Align with huggingface Top K sampling (#753) Abraham-Xu 2023-08-16 07:44:33 +08:00
805de738f6 Fix typo in tokenizer.py (#750) Ikko Eltociear Ashimine 2023-08-15 14:26:36 +09:00
1b151ed181 Fix baichuan doc style (#748) Uranus 2023-08-14 11:57:31 +08:00
e06f504a76 Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715) WanMok 2023-08-11 12:14:34 -07:00
462ae5220a [Fix] unwantted bias in InternLM Model (#740) WRH 2023-08-12 02:40:37 +08:00
66c54aa9c3 Check the max prompt length for the OpenAI completions API (#472) Nicolas Basile 2023-08-08 17:43:49 -07:00
735ecfff61 add internlm model (#528) Jia Guoqing 2023-08-09 07:35:06 +08:00
a57d13cc96 add QWen-7b (#685) Qing 2023-08-09 04:50:38 +08:00
79af7e96a0 [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420) Dean Leitersdorf 2023-08-04 20:57:29 +03:00
621980bdc0 fix: incorrect bigcode attention heads num (#676) Wen Sun 2023-08-05 01:35:22 +08:00
aa84c92ef6 Bump up version to 0.1.3 (#657) Zhuohan Li 2023-08-02 16:46:53 -07:00
f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) Zhuohan Li 2023-08-02 16:45:12 -07:00
55fe8a81ec Refactor scheduler (#658) Woosuk Kwon 2023-08-02 16:42:01 -07:00
e8ddc08ec8 [BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -04:00
1b0bd0fe8a Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -07:00
20044cab7a Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -07:00
64f23c2900 fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +08:00
d4c7755ca8 fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +08:00
aa39e42c5a fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +08:00
953f28cf9a fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +08:00
c0d00f5be6 [Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +08:00
58a072be15 [Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -07:00
82ad323dee [Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -07:00
df5dd3c68e Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -07:00
2d867b55fa fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +01:00
d7a1c6d614 Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +08:00
7d5a155e4a [Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -07:00
1dde34e0f8 GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +08:00
6fc2a38b11 Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -07:00
c487a221ee Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -07:00
9925c17940 Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -07:00
8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +08:00
cf21a9bd5c support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +08:00
16c3e295a8 fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +02:00
bda41c70dd hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +08:00
453bafb96f Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -07:00
328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +01:00
b4b195b360 fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -07:00
20b0d88d16 Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +01:00
2bdea7ac11 [Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -04:00
58df2883cb [Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -07:00
6d7d95a70a Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -04:00
96853af5a8 Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -04:00
dbed69058c Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +08:00
7b6ae94059 add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +08:00
c6dfc3cdbe Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +08:00
51be365143 fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +08:00
c894836108 [Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -04:00
75beba29b5 Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -04:00
ddfdf470ae Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -07:00
b6fbb9a565 Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -07:00
2179e4f4c5 avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -07:00
a945fcc2ae Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +02:00
be54f8e5c4 [Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -05:00
b396cb4998 fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +08:00
1c395b4eaa Bump up the version (#300) Woosuk Kwon 2023-07-04 21:41:53 -07:00
3d64cf019e [Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +08:00
98fe8cb542 [Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -07:00
ffa6d2f9f9 [Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -07:00
404422f42e [Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -07:00
7717d0838b Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +08:00
42e0c1df78 [Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -07:00
e41f06702c Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -07:00
d6fa1be3a8 [Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -07:00
0ffded812a [Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -07:00
0bd2a573a5 Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +02:00
49b26e2cec feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +08:00
dafd924c1f Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -07:00
598dc4b79a [Fix] Weight loading for GPTBigCode (#313) Zhuohan Li 2023-06-29 22:14:17 -07:00
85de093472 [Fix] Do not pin memory when in WSL (#312) Zhuohan Li 2023-06-29 15:00:21 -07:00
f72297562f Add news for the vllm+skypilot example (#314) Zhanghao Wu 2023-06-29 12:32:37 -07:00
9d27b09d12 Update README.md (#306) Bayang 2023-06-29 14:52:15 +01:00
998d9d1509 [Tokenizer] Add tokenizer mode (#298) Woosuk Kwon 2023-06-28 14:19:22 -07:00
425040d4c1 remove floats == 0 comparison (#285) Lily Liu 2023-06-28 14:11:51 -07:00
4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) Woosuk Kwon 2023-06-28 09:46:58 -07:00
bdd6b4c8bc Add LLM.set_tokenizer (#283) Jishnu Ray Chowdhury 2023-06-28 02:28:29 -05:00
2b7d3aca2e Update setup.py (#282) Cody Yu 2023-06-27 14:34:23 -07:00
4026a049d3 expand coverage of gpt2 model loading (#271) twaka 2023-06-27 22:27:41 +09:00
43710e8d09 [Fix] Fix default port number in benchmark scripts (#265) Zhuohan Li 2023-06-26 13:15:35 -07:00
526df28fb2 [BugFix] Fix a bug in counting running sequences (#266) Woosuk Kwon 2023-06-26 13:09:02 -07:00
2cf1a333b6 [Doc] Documentation for distributed inference (#261) Zhuohan Li 2023-06-26 11:34:23 -07:00
0b7db411b5 [Bug] Fix the OOM condition for CPU cache (#260) Zhuohan Li 2023-06-26 11:16:13 -07:00
471a7a4566 Compatible with Decapoda Research llama hf version (#251) BasicCoder 2023-06-27 00:23:57 +08:00
6214dd6ce9 Update README.md (#236) Lianmin Zheng 2023-06-25 16:58:06 -07:00
0603379863 fix wrong using getattr to get dict value (#232) metacryptom 2023-06-25 13:00:24 +08:00
665c48963b [Docs] Add GPTBigCode to supported models (#213) Woosuk Kwon 2023-06-22 15:05:11 -07:00
298695b766 GPTBigCode (StarCoder, SantaCoder Support) (#209) Michael Feil 2023-06-22 19:49:27 +02:00
83658c8ace Bump up version to 0.1.1 (#204) Zhuohan Li 2023-06-22 15:33:32 +08:00
1d24ccb96c [Fix] Better error message when there is OOM during cache initialization (#203) Zhuohan Li 2023-06-22 15:30:06 +08:00

... 46 47 48 49 50

Commit Graph Select branches Hide Pull Requests main Mono Color

Commit Graph

Select branches

Hide Pull Requests

main