Woosuk Kwon
|
8ce9c50d40
|
Avoid compiling kernels for double data type (#933)
|
2023-09-02 14:59:47 +09:00 |
Woosuk Kwon
|
32b6816e55
|
Add tests for models (#922)
|
2023-09-01 11:19:43 +09:00 |
Zhuohan Li
|
c128d69856
|
Fix README.md Link (#927)
|
2023-08-31 17:18:34 -07:00 |
Woosuk Kwon
|
55b28b1eee
|
[Docs] Minor fixes in supported models (#920)
* Minor fix in supported models
* Add another small fix for Aquila model
---------
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-08-31 16:28:39 -07:00 |
Dong-Yong Lee
|
e11222333f
|
fix: bug fix when penalties are negative (#913)
Co-authored-by: dongyong-lee <dongyong.lee@navercorp.com>
|
2023-09-01 00:37:17 +09:00 |
Aman Gupta Karmani
|
28873a2799
|
Improve _prune_hidden_states micro-benchmark (#707)
|
2023-08-31 13:28:43 +09:00 |
Zhuohan Li
|
0080d8329d
|
Add acknowledgement to a16z grant
|
2023-08-30 02:26:47 -07:00 |
JFDuan
|
0d93f15694
|
Accelerate LLaMA model loading (#234)
|
2023-08-30 01:00:13 -07:00 |
lplcor
|
becd7a56f1
|
Enable request body OpenAPI spec for OpenAI endpoints (#865)
|
2023-08-29 21:54:08 -07:00 |
Aman Gupta Karmani
|
75471386de
|
use flash-attn via xformers (#877)
|
2023-08-29 21:52:13 -07:00 |
Zhuohan Li
|
d2b2eed67c
|
[Fix] Fix a condition for ignored sequences (#867)
|
2023-08-27 23:00:56 -07:00 |
Antoni Baum
|
4b6f069b6f
|
Add support for CodeLlama (#854)
|
2023-08-25 12:44:07 -07:00 |
Woosuk Kwon
|
791d79de32
|
Bump up the version to v0.1.4 (#846)
|
2023-08-25 12:28:00 +09:00 |
Woosuk Kwon
|
94d2f59895
|
Set replacement=True in torch.multinomial (#858)
|
2023-08-25 12:22:01 +09:00 |
wenjun93
|
75c0ca9d43
|
Clean up code (#844)
|
2023-08-23 16:44:15 -07:00 |
Woosuk Kwon
|
2a4ec90854
|
Fix for breaking changes in xformers 0.0.21 (#834)
|
2023-08-23 17:44:21 +09:00 |
ldwang
|
85ebcda94d
|
Fix typo of Aquila in README.md (#836)
|
2023-08-22 20:48:36 -07:00 |
Woosuk Kwon
|
d64bf1646c
|
Implement approximate GELU kernels (#828)
|
2023-08-23 07:43:21 +09:00 |
Woosuk Kwon
|
a41c20435e
|
Add compute capability 8.9 to default targets (#829)
|
2023-08-23 07:28:38 +09:00 |
Wen Sun
|
eedac9dba0
|
fix: revert code to avoid no attribute problem (#827)
|
2023-08-22 11:55:16 -07:00 |
Zhuohan Li
|
14f9c72bfd
|
Update Supported Model List (#825)
|
2023-08-22 11:51:44 -07:00 |
shunxing1234
|
ad5f2fe34c
|
Add support for aquila (#663)
* add aquila
Signed-off-by: ftgreat <ftgreat@163.com>
* fix some bug
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
* delete pdb
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
* fix bugs
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
* delete whitespace
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
* format
* fix order
---------
Signed-off-by: ftgreat <ftgreat@163.com>
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
Co-authored-by: ftgreat <ftgreat@163.com>
|
2023-08-22 00:13:36 -07:00 |
zhaoyang-star
|
4f8584756d
|
Fix mqa is false case in gpt_bigcode (#806)
|
2023-08-21 22:22:06 -07:00 |
Xudong Zhang
|
65fc1c3127
|
set default coompute capability according to cuda version (#773)
|
2023-08-21 16:05:44 -07:00 |
Daniel
|
c393af6cd7
|
[Feature | CI] Added a github action to build wheels (#746)
|
2023-08-21 16:59:15 +09:00 |
wangcx18
|
0c04ce3234
|
Fix typo in sampling_params.py (#788)
|
2023-08-18 10:12:46 +09:00 |
Xinyu Yang
|
73b3de79ea
|
explicitly del state (#784)
|
2023-08-17 12:56:04 -07:00 |
Abraham-Xu
|
d1744376ae
|
Align with huggingface Top K sampling (#753)
|
2023-08-15 16:44:33 -07:00 |
Ikko Eltociear Ashimine
|
805de738f6
|
Fix typo in tokenizer.py (#750)
conjuction -> conjunction
|
2023-08-14 22:26:36 -07:00 |
Uranus
|
1b151ed181
|
Fix baichuan doc style (#748)
|
2023-08-13 20:57:31 -07:00 |
WanMok
|
e06f504a76
|
Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715)
|
2023-08-11 12:14:34 -07:00 |
WRH
|
462ae5220a
|
[Fix] unwantted bias in InternLM Model (#740)
|
2023-08-11 11:40:37 -07:00 |
Nicolas Basile
|
66c54aa9c3
|
Check the max prompt length for the OpenAI completions API (#472)
|
2023-08-08 17:43:49 -07:00 |
Jia Guoqing
|
735ecfff61
|
add internlm model (#528)
|
2023-08-08 16:35:06 -07:00 |
Qing
|
a57d13cc96
|
add QWen-7b (#685)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
|
2023-08-08 13:50:38 -07:00 |
Dean Leitersdorf
|
79af7e96a0
|
[OPTIMIZATION] Optimizes the single_query_cached_kv_attention kernel (#420)
|
2023-08-04 10:57:29 -07:00 |
Wen Sun
|
621980bdc0
|
fix: incorrect bigcode attention heads num (#676)
|
2023-08-04 10:35:22 -07:00 |
Zhuohan Li
|
aa84c92ef6
|
Bump up version to 0.1.3 (#657)
|
2023-08-02 16:46:53 -07:00 |
Zhuohan Li
|
f7389f4763
|
[Doc] Add Baichuan 13B to supported models (#656)
|
2023-08-02 16:45:12 -07:00 |
Woosuk Kwon
|
55fe8a81ec
|
Refactor scheduler (#658)
|
2023-08-02 16:42:01 -07:00 |
YHPeter
|
e8ddc08ec8
|
[BUG FIX] upgrade fschat version to 0.2.23 (#650)
Co-authored-by: hao.yu <hao.yu@cn-c017.server.mila.quebec>
|
2023-08-02 14:05:59 -07:00 |
Zhuohan Li
|
1b0bd0fe8a
|
Add Falcon support (new) (#592)
|
2023-08-02 14:04:39 -07:00 |
Lily Liu
|
20044cab7a
|
Fix log message in scheduler (#652)
|
2023-08-02 13:35:10 -07:00 |
Song
|
64f23c2900
|
fix baichuan for different position embedding for 7b and 13b models (#643)
|
2023-08-01 22:22:51 -07:00 |
Qing
|
d4c7755ca8
|
fix biachuan-7b tp (#598)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
|
2023-08-01 15:41:36 -07:00 |
Chaofan Lin
|
aa39e42c5a
|
fix doc (#622)
|
2023-07-31 13:11:57 -07:00 |
Fang li
|
953f28cf9a
|
fix ModuleNotFoundError (#599)
Co-authored-by: fangli <fangli@tencent.com>
|
2023-07-29 20:52:41 -07:00 |
Xudong Zhang
|
c0d00f5be6
|
[Fix] fix import error of RayWorker (#604) (#605)
|
2023-07-27 23:37:40 -07:00 |
Zhuohan Li
|
58a072be15
|
[Fix] Add model sequence length into model config (#575)
|
2023-07-25 23:46:30 -07:00 |
Zhuohan Li
|
82ad323dee
|
[Fix] Add chat completion Example and simplify dependencies (#576)
|
2023-07-25 23:45:48 -07:00 |