Commit Graph

304 Commits

Author SHA1 Message Date
Lily Liu 425040d4c1
remove floats == 0 comparison (#285) 2023-06-28 14:11:51 -07:00
Woosuk Kwon 4338cc4750
[Tokenizer] Add an option to specify tokenizer (#284) 2023-06-28 09:46:58 -07:00
Jishnu Ray Chowdhury bdd6b4c8bc
Add LLM.set_tokenizer (#283) 2023-06-28 00:28:29 -07:00
Cody Yu 2b7d3aca2e
Update setup.py (#282)
Co-authored-by: neubig <neubig@gmail.com>
2023-06-27 14:34:23 -07:00
twaka 4026a049d3
expand coverage of gpt2 model loading (#271) 2023-06-27 06:27:41 -07:00
Zhuohan Li 43710e8d09
[Fix] Fix default port number in benchmark scripts (#265) 2023-06-26 13:15:35 -07:00
Woosuk Kwon 526df28fb2
[BugFix] Fix a bug in counting running sequences (#266) 2023-06-26 13:09:02 -07:00
Zhuohan Li 2cf1a333b6
[Doc] Documentation for distributed inference (#261) 2023-06-26 11:34:23 -07:00
Zhuohan Li 0b7db411b5
[Bug] Fix the OOM condition for CPU cache (#260) 2023-06-26 11:16:13 -07:00
BasicCoder 471a7a4566
Compatible with Decapoda Research llama hf version (#251) 2023-06-26 09:23:57 -07:00
Lianmin Zheng 6214dd6ce9
Update README.md (#236) 2023-06-25 16:58:06 -07:00
metacryptom 0603379863
fix wrong using getattr to get dict value (#232) 2023-06-24 22:00:24 -07:00
Woosuk Kwon 665c48963b
[Docs] Add GPTBigCode to supported models (#213) 2023-06-22 15:05:11 -07:00
Michael Feil 298695b766
GPTBigCode (StarCoder, SantaCoder Support) (#209) 2023-06-23 01:49:27 +08:00
Zhuohan Li 83658c8ace
Bump up version to 0.1.1 (#204) 2023-06-22 15:33:32 +08:00
Zhuohan Li 1d24ccb96c
[Fix] Better error message when there is OOM during cache initialization (#203) 2023-06-22 15:30:06 +08:00
Woosuk Kwon 14f0b39cda
[Bugfix] Fix a bug in RequestOutput.finished (#202) 2023-06-22 00:17:24 -07:00
Zhuohan Li 2e0d314384
fix-ray (#193) 2023-06-22 00:21:41 +08:00
Woosuk Kwon 67d96c29fb
Use slow tokenizer for open llama models (#168) 2023-06-20 14:19:47 +08:00
Zhuohan Li 033f5c78f5
Remove e.g. in README (#167) 2023-06-20 14:00:28 +08:00
Woosuk Kwon 794e578de0
[Minor] Fix URLs (#166) 2023-06-19 22:57:14 -07:00
Woosuk Kwon caddfc14c1
[Minor] Fix icons in doc (#165) 2023-06-19 20:35:38 -07:00
Zhuohan Li fc72e39de3
Change image urls (#164) 2023-06-20 11:15:15 +08:00
Woosuk Kwon b7e62d3454
Fix repo & documentation URLs (#163) 2023-06-19 20:03:40 -07:00
Woosuk Kwon 364536acd1
[Docs] Minor fix (#162) 2023-06-19 19:58:23 -07:00
Zhuohan Li 0b32a987dd
Add and list supported models in README (#161) 2023-06-20 10:57:46 +08:00
Woosuk Kwon 570fb2e9cc
[PyPI] Fix package info in setup.py (#158) 2023-06-19 18:05:01 -07:00
Zhuohan Li a255885f83
Add logo and polish readme (#156) 2023-06-19 16:31:13 +08:00
Woosuk Kwon 5822ede66e
Add performance figures for dark mode (#160) 2023-06-18 23:46:24 -07:00
Zhuohan Li 0370afa2e5
Remove benchmark_async_llm_server.py (#155) 2023-06-19 11:12:37 +08:00
Woosuk Kwon 7e2a913c64
[Minor] Fix CompletionOutput.__repr__ (#157) 2023-06-18 19:58:25 -07:00
Woosuk Kwon 3f92038b99
Add comments on swap space (#154) 2023-06-18 11:39:35 -07:00
Woosuk Kwon dcda03b4cb
Write README and front page of doc (#147) 2023-06-18 03:19:38 -07:00
Zhuohan Li bf5f121c02
Reduce GPU memory utilization to make sure OOM doesn't happen (#153) 2023-06-18 17:33:50 +08:00
Zhuohan Li bec7b2dc26
Add quickstart guide (#148) 2023-06-18 01:26:12 +08:00
Woosuk Kwon 0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
Zhuohan Li e5464ee484
Rename servers to engines (#152) 2023-06-17 17:25:21 +08:00
Woosuk Kwon bab8f3dd0d
[Minor] Fix benchmark_throughput.py (#151) 2023-06-16 21:00:52 -07:00
Zhuohan Li eedb46bf03
Rename servers and change port numbers to reduce confusion (#149) 2023-06-17 00:13:02 +08:00
Woosuk Kwon 311490a720
Add script for benchmarking serving throughput (#145) 2023-06-14 19:55:38 -07:00
Woosuk Kwon da5ddcd544
Remove redundant code in ColumnParallelLinear (#146) 2023-06-10 21:25:11 -07:00
Zhuohan Li 5020e1e80c
Non-streaming simple fastapi server (#144) 2023-06-10 10:43:07 -07:00
Zhuohan Li 4298374265
Add docstrings for LLMServer and related classes and examples (#142) 2023-06-07 18:25:20 +08:00
Woosuk Kwon e38074b1e6
Support FP32 (#141) 2023-06-07 00:40:21 -07:00
Woosuk Kwon 376725ce74
[PyPI] Packaging for PyPI distribution (#140) 2023-06-05 20:03:14 -07:00
Woosuk Kwon 456941cfe4
[Docs] Write the `Adding a New Model` section (#138) 2023-06-05 20:01:26 -07:00
Zhuohan Li 1a956e136b
Fix various issues of async servers (#135) 2023-06-05 23:44:50 +08:00
Woosuk Kwon 8274ca23ac
Add docstrings for LLM (#137) 2023-06-04 12:52:41 -07:00
Woosuk Kwon 62ec38ea41
Document supported models (#127) 2023-06-02 22:35:17 -07:00
Woosuk Kwon 0eda2e0953
Add .readthedocs.yaml (#136) 2023-06-02 22:27:44 -07:00