Commit Graph

17 Commits

Author SHA1 Message Date
Michael Goin 5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with `clang-format` (#4722) 2024-05-22 07:18:41 +00:00
SangBin Cho 2e9a2227ec
[Lora] Support long context lora (#4787)
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.

It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.

Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
SangBin Cho fb087af52e
[mypy][7/N] Cover all directories (#4555) 2024-05-02 10:47:41 -07:00
SangBin Cho cf8cac8c70
[mypy][6/N] Fix all the core subdirectory typing (#4450)
Co-authored-by: Cade Daniel <edacih@gmail.com>
2024-05-02 03:01:00 +00:00
SangBin Cho df29793dc7
[mypy][5/N] Support all typing on model executor (#4427) 2024-04-28 19:01:26 -07:00
SangBin Cho b5b4a398a7
[Mypy] Typing lora folder (#4337) 2024-04-25 19:13:50 +00:00
SangBin Cho 0ae11f78ab
[Mypy] Part 3 fix typing for nested directories for most of directory (#4161) 2024-04-22 21:32:44 -07:00
SangBin Cho 533d2a1f39
[Typing] Mypy typing part 2 (#4043)
Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>
2024-04-17 17:28:43 -07:00
SangBin Cho 09473ee41c
[mypy] Add mypy type annotation part 1 (#4006) 2024-04-12 14:35:50 -07:00
SangBin Cho 01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Zhuohan Li 4c922709b6
Add distributed model executor abstraction (#3191) 2024-03-11 11:03:45 -07:00
Massimiliano Pronesti 93dc5a2870
chore(vllm): codespell for spell checking (#2820) 2024-02-21 18:56:01 -08:00
Simon Mo 1e4277d2d1
lint: format all python file instead of just source code (#2567) 2024-01-23 15:53:06 -08:00
Simon Mo 5ffc0d13a2
Migrate linter from `pylint` to `ruff` (#1665) 2023-11-20 11:58:01 -08:00
Cade Daniel e575df33b1
[Small] Formatter only checks lints in changed files (#1528) 2023-10-31 15:39:38 -07:00
Zhuohan Li ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Zhuohan Li d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00