Commit Graph

  • 85ebcda94d Fix typo of Aquila in README.md (#836) ldwang 2023-08-23 11:48:36 +08:00
  • d64bf1646c Implement approximate GELU kernels (#828) Woosuk Kwon 2023-08-23 07:43:21 +09:00
  • a41c20435e Add compute capability 8.9 to default targets (#829) Woosuk Kwon 2023-08-23 07:28:38 +09:00
  • eedac9dba0 fix: revert code to avoid no attribute problem (#827) Wen Sun 2023-08-23 02:55:16 +08:00
  • 14f9c72bfd Update Supported Model List (#825) Zhuohan Li 2023-08-22 11:51:44 -07:00
  • ad5f2fe34c Add support for aquila (#663) shunxing1234 2023-08-22 15:13:36 +08:00
  • 4f8584756d Fix mqa is false case in gpt_bigcode (#806) zhaoyang-star 2023-08-22 13:22:06 +08:00
  • 65fc1c3127 set default coompute capability according to cuda version (#773) Xudong Zhang 2023-08-22 07:05:44 +08:00
  • c393af6cd7 [Feature | CI] Added a github action to build wheels (#746) Daniel 2023-08-21 10:59:15 +03:00
  • 0c04ce3234 Fix typo in sampling_params.py (#788) wangcx18 2023-08-18 09:12:46 +08:00
  • 73b3de79ea explicitly del state (#784) Xinyu Yang 2023-08-18 03:56:04 +08:00
  • d1744376ae Align with huggingface Top K sampling (#753) Abraham-Xu 2023-08-16 07:44:33 +08:00
  • 805de738f6 Fix typo in tokenizer.py (#750) Ikko Eltociear Ashimine 2023-08-15 14:26:36 +09:00
  • 1b151ed181 Fix baichuan doc style (#748) Uranus 2023-08-14 11:57:31 +08:00
  • e06f504a76 Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715) WanMok 2023-08-11 12:14:34 -07:00
  • 462ae5220a [Fix] unwantted bias in InternLM Model (#740) WRH 2023-08-12 02:40:37 +08:00
  • 66c54aa9c3 Check the max prompt length for the OpenAI completions API (#472) Nicolas Basile 2023-08-08 17:43:49 -07:00
  • 735ecfff61 add internlm model (#528) Jia Guoqing 2023-08-09 07:35:06 +08:00
  • a57d13cc96 add QWen-7b (#685) Qing 2023-08-09 04:50:38 +08:00
  • 79af7e96a0 [OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420) Dean Leitersdorf 2023-08-04 20:57:29 +03:00
  • 621980bdc0 fix: incorrect bigcode attention heads num (#676) Wen Sun 2023-08-05 01:35:22 +08:00
  • aa84c92ef6 Bump up version to 0.1.3 (#657) Zhuohan Li 2023-08-02 16:46:53 -07:00
  • f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) Zhuohan Li 2023-08-02 16:45:12 -07:00
  • 55fe8a81ec Refactor scheduler (#658) Woosuk Kwon 2023-08-02 16:42:01 -07:00
  • e8ddc08ec8 [BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -04:00
  • 1b0bd0fe8a Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -07:00
  • 20044cab7a Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -07:00
  • 64f23c2900 fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +08:00
  • d4c7755ca8 fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +08:00
  • aa39e42c5a fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +08:00
  • 953f28cf9a fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +08:00
  • c0d00f5be6 [Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +08:00
  • 58a072be15 [Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -07:00
  • 82ad323dee [Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -07:00
  • df5dd3c68e Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -07:00
  • 2d867b55fa fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +01:00
  • d7a1c6d614 Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +08:00
  • 7d5a155e4a [Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -07:00
  • 1dde34e0f8 GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +08:00
  • 6fc2a38b11 Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -07:00
  • c487a221ee Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -07:00
  • 9925c17940 Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -07:00
  • 8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +08:00
  • cf21a9bd5c support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +08:00
  • 16c3e295a8 fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +02:00
  • bda41c70dd hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +08:00
  • 453bafb96f Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -07:00
  • 328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +01:00
  • b4b195b360 fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -07:00
  • 20b0d88d16 Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +01:00
  • 2bdea7ac11 [Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -04:00
  • 58df2883cb [Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -07:00
  • 6d7d95a70a Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -04:00
  • 96853af5a8 Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -04:00
  • dbed69058c Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +08:00
  • 7b6ae94059 add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +08:00
  • c6dfc3cdbe Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +08:00
  • 51be365143 fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +08:00
  • c894836108 [Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -04:00
  • 75beba29b5 Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -04:00
  • ddfdf470ae Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -07:00
  • b6fbb9a565 Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -07:00
  • 2179e4f4c5 avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -07:00
  • a945fcc2ae Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +02:00
  • be54f8e5c4 [Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -05:00
  • b396cb4998 fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +08:00
  • 1c395b4eaa Bump up the version (#300) Woosuk Kwon 2023-07-04 21:41:53 -07:00
  • 3d64cf019e [Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +08:00
  • 98fe8cb542 [Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -07:00
  • ffa6d2f9f9 [Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -07:00
  • 404422f42e [Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -07:00
  • 7717d0838b Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +08:00
  • 42e0c1df78 [Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -07:00
  • e41f06702c Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -07:00
  • d6fa1be3a8 [Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -07:00
  • 0ffded812a [Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -07:00
  • 0bd2a573a5 Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +02:00
  • 49b26e2cec feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +08:00
  • dafd924c1f Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -07:00
  • 598dc4b79a [Fix] Weight loading for GPTBigCode (#313) Zhuohan Li 2023-06-29 22:14:17 -07:00
  • 85de093472 [Fix] Do not pin memory when in WSL (#312) Zhuohan Li 2023-06-29 15:00:21 -07:00
  • f72297562f Add news for the vllm+skypilot example (#314) Zhanghao Wu 2023-06-29 12:32:37 -07:00
  • 9d27b09d12 Update README.md (#306) Bayang 2023-06-29 14:52:15 +01:00
  • 998d9d1509 [Tokenizer] Add tokenizer mode (#298) Woosuk Kwon 2023-06-28 14:19:22 -07:00
  • 425040d4c1 remove floats == 0 comparison (#285) Lily Liu 2023-06-28 14:11:51 -07:00
  • 4338cc4750 [Tokenizer] Add an option to specify tokenizer (#284) Woosuk Kwon 2023-06-28 09:46:58 -07:00
  • bdd6b4c8bc Add LLM.set_tokenizer (#283) Jishnu Ray Chowdhury 2023-06-28 02:28:29 -05:00
  • 2b7d3aca2e Update setup.py (#282) Cody Yu 2023-06-27 14:34:23 -07:00
  • 4026a049d3 expand coverage of gpt2 model loading (#271) twaka 2023-06-27 22:27:41 +09:00
  • 43710e8d09 [Fix] Fix default port number in benchmark scripts (#265) Zhuohan Li 2023-06-26 13:15:35 -07:00
  • 526df28fb2 [BugFix] Fix a bug in counting running sequences (#266) Woosuk Kwon 2023-06-26 13:09:02 -07:00
  • 2cf1a333b6 [Doc] Documentation for distributed inference (#261) Zhuohan Li 2023-06-26 11:34:23 -07:00
  • 0b7db411b5 [Bug] Fix the OOM condition for CPU cache (#260) Zhuohan Li 2023-06-26 11:16:13 -07:00
  • 471a7a4566 Compatible with Decapoda Research llama hf version (#251) BasicCoder 2023-06-27 00:23:57 +08:00
  • 6214dd6ce9 Update README.md (#236) Lianmin Zheng 2023-06-25 16:58:06 -07:00
  • 0603379863 fix wrong using getattr to get dict value (#232) metacryptom 2023-06-25 13:00:24 +08:00
  • 665c48963b [Docs] Add GPTBigCode to supported models (#213) Woosuk Kwon 2023-06-22 15:05:11 -07:00
  • 298695b766 GPTBigCode (StarCoder, SantaCoder Support) (#209) Michael Feil 2023-06-22 19:49:27 +02:00
  • 83658c8ace Bump up version to 0.1.1 (#204) Zhuohan Li 2023-06-22 15:33:32 +08:00
  • 1d24ccb96c [Fix] Better error message when there is OOM during cache initialization (#203) Zhuohan Li 2023-06-22 15:30:06 +08:00