Commit Graph

15 Commits

Author SHA1 Message Date
Cyrus Leung b1c255630d
[Core] Avoid the need to pass `None` values to `Sequence.inputs` (#5099) 2024-05-29 16:05:01 -07:00
afeldman-nm 4238bc82f2
[Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837) 2024-05-29 16:09:13 +00:00
Cyrus Leung 5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Robert Shaw fcc2994be6
[CI] Nits for bad initialization of SeqGroup in testing (#4748) 2024-05-10 18:01:01 -04:00
youkaichao 20cfcdec99
[Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) 2024-05-08 12:07:05 -07:00
youkaichao 469f85c782
[Core][Optimization] change copy-on-write from dict[int, list] to list (#4648) 2024-05-07 11:06:32 -07:00
SangBin Cho 0f8a91401c
[Core] Ignore infeasible swap requests. (#4557) 2024-05-02 14:31:20 -07:00
Cade Daniel 93deb0b38f
[Speculative decoding 4/9] Lookahead scheduling for speculative decoding (#3250) 2024-04-01 22:55:24 +00:00
Cade Daniel 14ccd94c89
[Core][Bugfix]Refactor block manager for better testability (#3492) 2024-03-27 23:59:28 -07:00
SangBin Cho 01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
ElizaWszola 9474e89ba4
[PREFIX CACHING FOLLOW UP] A bunch of fixes to block allocator performance when automatic prefix caching is disabled (#3357)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-03-20 00:11:11 -07:00
Breno Faria 49a3c8662b
Fixes #1556 double free (#3347) 2024-03-13 00:30:08 +00:00
Zhuohan Li 2f8844ba08
Re-enable the 80 char line width limit (#3305) 2024-03-10 19:49:14 -07:00
Cade Daniel a33ce60c66
[Testing] Fix core tests (#3224) 2024-03-06 01:04:23 -08:00
SangBin Cho 24aecf421a
[Tests] Add block manager and scheduler tests (#3108) 2024-03-05 18:23:34 -08:00