Commit Graph

3186 Commits

Author SHA1 Message Date
Joe Runde 3b3f1e7436
[Bugfix][core] replace heartbeat with pid check (#9818)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-30 09:34:07 -07:00
Elfie Guo 9ff4511e43
[Misc] Add chunked-prefill support on FlashInfer. (#9781) 2024-10-30 09:33:53 -07:00
Went-Liang 81f09cfd80
[Model] Support math-shepherd-mistral-7b-prm model (#9697)
Signed-off-by: Went-Liang <wenteng_liang@163.com>
2024-10-30 09:33:42 -07:00
Alex Brooks cc98f1e079
[CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
Woosuk Kwon 211fe91aa8
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) 2024-10-30 09:41:38 +00:00
Jee Jee Li 6aa6020f9b
[Misc] Specify minimum pynvml version (#9827)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-29 23:05:43 -07:00
youkaichao ff5ed6e1bc
[torch.compile] rework compile control with piecewise cudagraph (#9715)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 23:03:49 -07:00
Russell Bryant 7b0365efef
[Doc] Add the DCO to CONTRIBUTING.md (#9803)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-30 05:22:23 +00:00
Yan Ma 04a3ae0aca
[Bugfix] Fix multi nodes TP+PP for XPU (#8884)
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
2024-10-29 21:34:45 -07:00
Kevin H. Luu 62fac4b9aa
[ci/build] Pin CI dependencies version with pip-compile (#9810)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-30 03:34:55 +00:00
Michael Goin 226688bd61
[Bugfix][VLM] Make apply_fp8_linear work with >2D input (#9812) 2024-10-29 19:49:44 -07:00
Lily Liu 64cb1cdc3f
Update README.md (#9819) 2024-10-29 17:28:43 -07:00
youkaichao 1ab6f6b4ad
[core][distributed] fix custom allreduce in pytorch 2.5 (#9815)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 17:06:24 -07:00
Michael Goin bc73e9821c
[Bugfix] Fix prefix strings for quantized VLMs (#9772) 2024-10-29 16:02:59 -07:00
Simon Mo 8d7724104a
[Docs] Add notes about Snowflake Meetup (#9814)
Signed-off-by: simon-mo <simon.mo@hey.com>
2024-10-29 15:19:02 -07:00
Will Eaton 882a1ad0de
[Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
2024-10-29 15:07:37 -07:00
Joe Runde 67bdf8e523
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-29 14:13:20 -07:00
Kunjan 0ad216f575
[MISC] Set label value to timestamp over 0, to keep track of recent history (#9777)
Signed-off-by: Kunjan Patel <kunjanp@google.com>
2024-10-29 19:52:19 +00:00
Russell Bryant 7585ec996f
[CI/Build] mergify: fix rules for ci/build label (#9804)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-29 19:24:42 +00:00
Michael Goin ab6f981671
[CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808) 2024-10-29 11:12:43 -07:00
Junichi Sato ac3d748dba
[Model] Add LlamaEmbeddingModel as an embedding Implementation of LlamaModel (#9806) 2024-10-29 10:40:35 -07:00
yannicks1 0ce7798f44
[Misc]: Typo fix: Renaming classes (casualLM -> causalLM) (#9801)
Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
2024-10-29 10:39:20 -07:00
Sven Seeberg 0f43387157
[Bugfix] Use host argument to bind to interface (#9798) 2024-10-29 10:37:59 -07:00
tastelikefeet 08600ddc68
Fix the log to correct guide user to install modelscope (#9793)
Signed-off-by: yuze.zyz <yuze.zyz@alibaba-inc.com>
2024-10-29 10:36:59 -07:00
科英 74fc2d77ae
[Misc] Add metrics for request queue time, forward time, and execute time (#9659) 2024-10-29 10:32:56 -07:00
wangshuai09 622b7ab955
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-29 14:47:44 +00:00
Isotr0py 09500f7dde
[Model] Add BNB quantization support for Mllama (#9720) 2024-10-29 08:20:02 -04:00
Zhong Qishuai ef7865b4f9
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
2024-10-29 11:49:47 +00:00
Cyrus Leung eae3d48181
[Bugfix] Use temporary directory in registry (#9721) 2024-10-28 22:08:20 -07:00
Cyrus Leung e74f2d448c
[Doc] Specify async engine args in docs (#9726) 2024-10-28 22:07:57 -07:00
Jee Jee Li 7a4df5f200
[Model][LoRA]LoRA support added for Qwen (#9622)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-10-29 04:14:07 +00:00
Russell Bryant c5d7fb9ddc
[Doc] fix third-party model example (#9771)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-28 19:39:21 -07:00
youkaichao 76ed5340f0
[torch.compile] add deepseek v2 compile (#9775)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-28 14:35:17 -07:00
youkaichao 97b61bfae6
[misc] avoid circular import (#9765)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-28 20:51:23 +00:00
Yongzao aa0addb397
Adding "torch compile" annotations to moe models (#9758) 2024-10-28 13:49:56 -07:00
litianjian 5f8d8075f9
[Model][VLM] Add multi-video support for LLaVA-Onevision (#8905)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-28 18:04:10 +00:00
Russell Bryant 8b0e4f2ad7
[CI/Build] Adopt Mergify for auto-labeling PRs (#9259)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-28 09:38:09 -07:00
Yan Ma 2adb4409e0
[Bugfix] Fix ray instance detect issue (#9439) 2024-10-28 07:13:03 +00:00
Robert Shaw feb92fbe4a
Fix beam search eos (#9627) 2024-10-28 06:59:37 +00:00
youkaichao 32176fee73
[torch.compile] support moe models (#9632)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 21:58:04 -07:00
wangshuai09 4e2d95e372
[Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
madt2709 34a9941620
[Bugfix] Fix load config when using bools (#9533) 2024-10-27 13:46:41 -04:00
Harry Mellor e130c40e4e
Fix cache management in "Close inactive issues and PRs" actions workflow (#9734) 2024-10-27 10:30:03 -07:00
bnellnm 3cb07a36a2
[Misc] Upgrade to pytorch 2.5 (#9588)
Signed-off-by: Bill Nell <bill@neuralmagic.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-27 09:44:24 +00:00
youkaichao 8549c82660
[core] cudagraph output with tensor weak reference (#9724)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-27 00:19:28 -07:00
科英 67a6882da4
[Misc] SpecDecodeWorker supports profiling (#9719)
Signed-off-by: Abatom <abatom@163.com>
2024-10-27 04:18:03 +00:00
kakao-kevin-us 6650e6a930
[Model] Add classification Task with Qwen2ForSequenceClassification (#9704)
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
2024-10-26 17:53:35 +00:00
Vasiliy Alekseev 07e981fdf4
[Frontend] Bad words sampling parameter (#9717)
Signed-off-by: Vasily Alexeev <alvasian@yandex.ru>
2024-10-26 16:29:38 +00:00
ErkinSagiroglu 55137e8ee3
Fix: MI100 Support By Bypassing Custom Paged Attention (#9560) 2024-10-26 12:12:57 +00:00
Mengqing Cao 5cbdccd151
[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716) 2024-10-26 10:59:06 +00:00