Dipika Sikka
56a955e774
Bump to compressed-tensors v0.8.0 ( #10279 )
...
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2024-11-12 21:54:10 -08:00
Guillaume Calmettes
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint ( #9837 )
2024-10-30 18:15:56 -07:00
Dipika Sikka
48138a8415
[BugFix] Stop silent failures on compressed-tensors parsing ( #9381 )
2024-10-17 18:54:00 -07:00
Cyrus Leung
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
Michael Goin
22f8a69549
[Misc] Directly use compressed-tensors for checkpoint definitions ( #8909 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-15 15:40:25 -07:00
Roger Wang
b6d7392579
[Misc][CI/Build] Include `cv2` via `mistral_common[opencv]` ( #8951 )
2024-09-30 04:28:26 +00:00
Tyler Titsworth
260024a374
[Bugfix][Intel] Fix XPU Dockerfile Build ( #7824 )
...
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-27 23:45:50 -07:00
Cyrus Leung
1b49148e47
[Installation] Allow lower versions of FastAPI to maintain Ray 2.9 compatibility ( #8764 )
2024-09-26 16:54:09 -07:00
Chen Zhang
770ec6024f
[Model] Add support for the multi-modal Llama 3.2 model ( #8811 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Chang Su <chang.s.su@oracle.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-25 13:29:32 -07:00
youkaichao
92ba7e7477
[misc] upgrade mistral-common ( #8715 )
2024-09-22 15:41:59 -07:00
youkaichao
d4a2ac8302
[build] enable existing pytorch (for GH200, aarch64, nightly) ( #8713 )
2024-09-22 12:47:54 -07:00
Joe Runde
cca61642e0
[Bugfix] Fix 3.12 builds on main ( #8510 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-17 00:01:45 +00:00
Isotr0py
fc990f9795
[Bugfix][Kernel] Add `IQ1_M` quantization implementation to GGUF kernel ( #8357 )
2024-09-15 16:51:44 -06:00
Cyrus Leung
ecd7a1d5b6
[Installation] Gate FastAPI version for Python 3.8 ( #8456 )
2024-09-13 09:02:26 -07:00
Cyrus Leung
3f79bc3d1a
[Bugfix] Bump fastapi and pydantic version ( #8435 )
2024-09-13 03:21:42 +00:00
Patrick von Platen
d394787e52
Pixtral ( #8377 )
...
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
Yang Fan
3b7fea770f
[Model][VLM] Add Qwen2-VL model support ( #7905 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
Joe Runde
cfe712bf1a
[CI/Build] Use python 3.12 in cuda image ( #8133 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2024-09-07 13:03:16 -07:00
William Lin
1afc931987
[bugfix] >1.43 constraint for openai ( #8169 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-04 17:35:36 -07:00
Kyle Mistele
e02ce498be
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models ( #5649 )
...
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
Roger Wang
5b86b19954
[Misc] Optional installation of audio related packages ( #8063 )
2024-09-01 14:46:57 -07:00
Kaunil Dhruv
058344f89a
[Frontend]-config-cli-args ( #7737 )
...
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
Patrick von Platen
6fc4e6e07a
[Model] Add Mistral Tokenization to improve robustness and chat encoding ( #7739 )
2024-08-27 12:40:02 +00:00
Cyrus Leung
baaedfdb2d
[mypy] Enable following imports for entrypoints ( #7248 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Fei <dfdfcai4@gmail.com>
2024-08-20 23:28:21 -07:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization ( #7109 )
2024-08-18 17:57:20 -07:00
Xander Johnson
7c0b7ea214
[Bugfix] add >= 1.0 constraint for openai dependency ( #7612 )
2024-08-16 20:56:01 -07:00
PHILO-HE
f4da5f7b6d
[Misc] Update dockerfile for CPU to cover protobuf installation ( #7182 )
2024-08-15 10:03:01 -07:00
Kyle Sayers
f55a9aea45
[Misc] Revert `compressed-tensors` code reuse ( #7521 )
2024-08-14 15:07:37 -07:00
youkaichao
16422ea76f
[misc][plugin] add plugin system implementation ( #7426 )
2024-08-13 16:24:17 -07:00
Kyle Sayers
373538f973
[Misc] `compressed-tensors` code reuse ( #7277 )
2024-08-13 19:05:15 -04:00
Peter Salas
00c3d68e45
[Frontend][Core] Add plumbing to support audio language models ( #7446 )
2024-08-13 17:39:33 +00:00
Daniele
774cd1d3bf
[CI/Build] bump minimum cmake version ( #6999 )
2024-08-12 16:29:20 -07:00
Noam Gat
4fb7b52a2c
Updating LM Format Enforcer version to v0.10.6 ( #7189 )
2024-08-11 08:11:50 -04:00
Cyrus Leung
7eb4a51c5f
[Core] Support serving encoder/decoder models ( #7258 )
2024-08-09 10:39:41 +08:00
Isotr0py
360bd67cf0
[Core] Support loading GGUF model ( #5191 )
...
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00
Michael Goin
421e218b37
[Bugfix] Bump transformers to 4.43.2 ( #6752 )
2024-07-24 13:22:16 -07:00
Roger Wang
1bedf210e3
Bump `transformers` version for Llama 3.1 hotfix and patch Chameleon ( #6690 )
2024-07-23 13:47:48 -07:00
Noam Gat
9da4aad44b
Updating LM Format Enforcer version to v10.3 ( #6411 )
2024-07-13 10:09:12 +00:00
Roger Wang
f7160d946a
[Misc][Bugfix] Update transformers for tokenizer issue ( #6364 )
2024-07-12 08:40:07 +00:00
Lim Xiang Yang
fc17110bbe
[BugFix]: set outlines pkg version ( #6262 )
2024-07-11 04:37:11 +00:00
youkaichao
da78caecfa
[core][distributed] zmq fallback for broadcasting large objects ( #6183 )
...
[core][distributed] add zmq fallback for broadcasting large objects (#6183 )
2024-07-09 18:49:11 -07:00
danieljannai21
2c37540aa6
[Frontend] Add template related params to request ( #5709 )
2024-07-01 23:01:57 -07:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 ( #5908 )
2024-06-27 13:33:56 -07:00
Cyrus Leung
e9c2732b97
[CI/Build] Add tqdm to dependencies ( #5680 )
2024-06-19 08:37:33 -06:00
youkaichao
f07d513320
[build][misc] limit numpy version ( #5582 )
2024-06-16 16:07:01 -07:00
Breno Faria
7b0a0dfb22
[Frontend][Core] Update Outlines Integration from `FSM` to `Guide` ( #4109 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Breno Faria <breno.faria@intrafind.com>
2024-06-05 16:49:12 -07:00
Cyrus Leung
7a64d24aad
[Core] Support image processor ( #4197 )
2024-06-02 22:56:41 -07:00
Michael Goin
86b45ae065
[Bugfix] Relax tiktoken to >= 0.6.0 ( #4890 )
2024-05-17 12:58:52 -06:00
Alex Wu
52f8107cf2
[Frontend] Support OpenAI batch file format ( #4794 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
2024-05-15 19:13:36 -04:00
Noam Gat
bd99d22629
Update lm-format-enforcer to 0.10.1 ( #4631 )
2024-05-06 23:51:59 +00:00