Commit Graph

281 Commits

Author SHA1 Message Date
Alex Brooks cc98f1e079
[CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2024-10-30 09:32:17 -07:00
youkaichao ff5ed6e1bc
[torch.compile] rework compile control with piecewise cudagraph (#9715)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-10-29 23:03:49 -07:00
Joe Runde ef7faad1b8
🐛 Fixup more test failures from memory profiling (#9563)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-10-21 17:10:56 -07:00
Cyrus Leung 696b01af8f
[CI/Build] Split up decoder-only LM tests (#9488)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-20 21:27:50 -07:00
bnellnm eca2c5f7c0
[Bugfix] Fix support for dimension like integers and ScalarType (#9299) 2024-10-17 19:08:34 +00:00
Daniele a2c71c5405
[CI/Build] remove .github from .dockerignore, add dirty repo check (#9375) 2024-10-17 10:25:06 -07:00
Kuntai Du 81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Li, Jiang 5eda21e773
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344) 2024-10-17 12:21:04 -04:00
Lucas Wilkinson 9d30a056e7
[misc] CUDA Time Layerwise Profiler (#8337)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-10-17 10:36:09 -04:00
Cyrus Leung 1de76a0e55
[CI/Build] Test VLM embeddings (#9406) 2024-10-16 09:44:30 +00:00
Daniele 203ab8f80f
[CI/Build] setuptools-scm fixes (#8900) 2024-10-14 11:34:47 -07:00
Tyler Michael Smith 7342a7d7f8
[Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
Kevin H. Luu a78c6ba7c8
[ci/build] Add placeholder command for custom models test (#9262) 2024-10-10 15:45:09 -07:00
youkaichao e4d652ea3e
[torch.compile] integration with compilation control (#9058) 2024-10-10 12:39:36 -07:00
sroy745 f3a507f1d3
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) 2024-10-10 14:17:17 +08:00
Li, Jiang ca77dd7a44
[Hardware][CPU] Support AWQ for CPU backend (#7515) 2024-10-09 10:28:08 -06:00
youkaichao c8627cd41b
[ci][test] use load dummy for testing (#9165) 2024-10-09 00:38:40 -07:00
Michael Goin 9ba0bd6aa6
Add `lm-eval` directly to requirements-test.txt (#9161) 2024-10-08 18:22:31 -07:00
Isotr0py 4f95ffee6f
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) 2024-10-07 06:50:35 +00:00
Kuntai Du fbb74420e7
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412) 2024-10-04 14:01:44 -07:00
Murali Andoorveedu 0f6d7a9a34
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Lily Liu 1570203864
[Spec Decode] (1/2) Remove batch expansion (#8839) 2024-10-01 16:04:42 -07:00
Lily Liu bce324487a
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975) 2024-10-01 00:51:40 +00:00
Kevin H. Luu 1425a1bcf9
[ci] Add CODEOWNERS for test directories (#8795)
Signed-off-by: kevin <kevin@anyscale.com>
2024-10-01 00:47:08 +00:00
Cyrus Leung e1a3f5e831
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-28 09:54:35 -07:00
Tyler Titsworth 260024a374
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-27 23:45:50 -07:00
youkaichao d86f6b2afb
[misc] fix wheel name (#8919) 2024-09-27 22:10:44 -07:00
Luka Govedič 172d1cd276
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271) 2024-09-27 14:25:10 -04:00
Nick Hill 4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
Kevin H. Luu d9cfbc891e
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
2024-09-26 15:02:16 -07:00
bnellnm 300da09177
[Kernel] Fullgraph and opcheck tests (#8479) 2024-09-25 08:35:52 -06:00
Hongxia Yang 1c046447a6
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777) 2024-09-25 22:26:37 +08:00
Alexey Kondratiev(AMD) 2940afa04e
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670) 2024-09-20 10:27:44 -07:00
Alexey Kondratiev(AMD) 6cb748e190
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551) 2024-09-19 13:06:32 -07:00
Alexander Matveev 7c7714d856
[Core][Bugfix][Perf] Introduce `MQLLMEngine` to avoid `asyncio` OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
Alexey Kondratiev(AMD) 09deb4721f
[CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520) 2024-09-17 16:40:29 -07:00
sroy745 1009e93c5d
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631) 2024-09-17 07:35:01 -07:00
youkaichao 99aa4eddaf
[torch.compile] register allreduce operations as custom ops (#8526) 2024-09-16 22:57:57 -07:00
Simon Mo 5478c4b41f
[perf bench] set timeout to debug hanging (#8516) 2024-09-16 14:30:02 -07:00
ywfang 8a0cf1ddc3
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
Cyrus Leung a84e598e21
[CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
Nick Hill 551ce01078
[Core] Add engine option to return only deltas or final output (#7381) 2024-09-12 12:02:00 -07:00
Joe Runde f2e263b801
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-12 11:11:57 -07:00
Lily Liu 775f00f81e
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-11 14:07:34 -07:00
Alexey Kondratiev(AMD) aea02f30de
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373) 2024-09-11 18:31:41 +00:00
Li, Jiang 0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257) 2024-09-11 09:46:46 -07:00
sumitd2 02751a7a42
Fix ppc64le buildkite job (#8309) 2024-09-10 12:58:34 -07:00
Alexey Kondratiev(AMD) f421f3cefb
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130) 2024-09-10 11:51:15 -07:00
Dipika Sikka 6cd5e5b07e
[Misc] Fused MoE Marlin support for GPTQ (#8217) 2024-09-09 23:02:52 -04:00
sumitd2 b962ee1470
ppc64le: Dockerfile fixed, and a script for buildkite (#8026) 2024-09-07 11:18:40 -07:00