Commit Graph

1003 Commits

Author SHA1 Message Date
Laurent Mazare 4c338b0cd9
VarBuilder cleanup (#627)
* VarBuilder cleanup.

* Implement the basic varbuilders.

* Add the sharded code.

* Proper support for tensor sharding.
2023-08-27 18:03:26 +01:00
Laurent Mazare be471d50ab
Llama quantization. (#625) 2023-08-27 14:08:15 +01:00
Laurent Mazare 7151f2cf63
Add the quantize command. (#624)
* Add the quantize command.

* Bugfix for writing gguf files.

* And add a comment.
2023-08-27 11:35:19 +01:00
Laurent Mazare 6e485f2deb
Add some optional repeat penalty. (#623)
* Add some optional repeat penalty.

* Add the missing files.
2023-08-27 10:48:45 +01:00
Laurent Mazare 5320aa6b7d
Move the test-utils bits to a shared place. (#619) 2023-08-27 09:42:22 +01:00
Laurent Mazare a8b39dd7b7
Fix for q5_1 quantization. (#617)
* Fix for q5_1 quantization.

* Fix some typos.
2023-08-27 08:31:18 +01:00
Laurent Mazare fa0d75b18d
Quantization tests + fix some issues. (#616) 2023-08-27 08:17:38 +01:00
Laurent Mazare 28658054ff
More missing quantized bits. (#615)
* Q4_1 support.

* Add Q5_1 quantization.

* Tweak.
2023-08-27 07:52:26 +01:00
Laurent Mazare ab36a7f3e3
Fix for when f16c is not available. (#614) 2023-08-27 07:19:52 +01:00
Laurent Mazare f704e39761
Missing quants ops (#611)
* Another transmute tweak.

* Changelog tweak.

* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
Laurent Mazare fdf15f0e05
Another transmute tweak. (#610)
* Another transmute tweak.

* Changelog tweak.
2023-08-26 13:00:24 +01:00
Laurent Mazare 06b37ea7ad
Avoid using tmp values. (#609) 2023-08-26 12:28:28 +01:00
Lukas Kreussel c72eb3d75b
Add reference implementation for `q4k` and `q5k` (#586)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix

* `q4k` vec-dot

* `q5k` vec-dot

* Validate against GGML unit test results.

* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
Radamés Ajna 864227edbf
[WIP] Improve Yolo WASM UI example (#591)
* return detections with classes names

* ignore .DS_Store

* example how to load wasm module

* add param to set model size

* add param for model size

* accept iou and confidence threshold on run

* conf and iou thresholds

* clamp only

* remove images from branch

* a couple of renamings, add readme with instructions

* final design

* minor font + border update
2023-08-26 11:40:41 +01:00
Nicolas Patry b23b347b35
Merge pull request #601 from huggingface/repair_bf16_f16_cast
Repairing cast bf16/f16
2023-08-26 12:34:41 +02:00
Patrick von Platen 71518caeee
Align tensor device print more with PyTorch (#590)
* Improve tensor print

* Use CudaDevice only if enabled with cuda feature

* run rust fmt

* up

* improve

* rustfmt
2023-08-26 11:20:22 +01:00
Laurent Mazare 6559eae72c
Avoid some transmutes. (#607) 2023-08-25 18:21:37 +01:00
Laurent Mazare 46eb225ba5
Add some missing entries to the changelog. (#606) 2023-08-25 18:01:38 +01:00
Nicolas Patry aa67e5107d
Merge pull request #600 from huggingface/codellama_gpu_support
Adding support for codellama in examples.
2023-08-25 18:25:26 +02:00
Nicolas Patry c105550405 s/panic/bail/ 2023-08-25 18:05:07 +02:00
Laurent Mazare ca6c050b04
Cleanup the pose reporting code. (#605) 2023-08-25 16:49:21 +01:00
Laurent Mazare 9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. (#604)
* Neon intrinsics for the q8_0 vecdot.

* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
Laurent Mazare 0afbc435df
Add some configurable legend for yolo detection. (#603)
* Add some configurable legend for yolo detection.

* Clippyness.
2023-08-25 13:50:31 +01:00
Nicolas Patry d4e75d5825 Let's keep the dirty code on its own. 2023-08-25 12:01:58 +00:00
Nicolas Patry be371e827c Intermediary float cast is necessary for cuda 11.8 2023-08-25 11:54:30 +00:00
Laurent Mazare 97909e5068
Move the yolo model bits in a separate file. (#602)
* Move the yolo model bits in a separate file.

* Improve the drawing.

* Bugfix.
2023-08-25 12:47:55 +01:00
Nicolas Patry 1c1e34735e `static_cast` ? 2023-08-25 11:40:36 +00:00
Nicolas Patry db8bab8b7a Different casting ? 2023-08-25 10:49:22 +00:00
Nicolas Patry bc131b402b Repairing cast bf16/f16 2023-08-25 10:38:19 +00:00
Laurent Mazare 8bc5fffa45
More support for pose estimation in yolo-v8. (#599)
* More support for pose estimation in yolo-v8.

* Support both object detection and pose-estimation in the yolo-v8 example.
2023-08-25 11:21:11 +01:00
Nicolas Patry 4826a4212e Adding support for codellama in examples.
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
Laurent Mazare afc10a3232
AVX version for the q8-0 multiplications. (#598) 2023-08-25 10:14:49 +01:00
Laurent Mazare d728e646c2
Use resolver 2 explicitely. (#597) 2023-08-25 09:35:40 +01:00
Laurent Mazare c093b03d51
Generic implementation of vecdot for q80. (#596)
* Generic implementation of vecdot for q80.

* Add support for code-llama 7b.

* Support more code-llama.
2023-08-25 09:04:05 +01:00
Laurent Mazare d8ba0452dc
Fail on bf16. (#594) 2023-08-25 06:10:38 +01:00
Laurent Mazare 189442a0fa
Add the pose estimation head for yolo. (#589)
* Add the pose estimation head for yolo.

* Properly handle the added position dimensions.

* Integrate the pose estimation head in the forward pass.

* Renaming.

* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
Laurent Mazare 2cde0cb74b
More pickle support. (#588)
* More pickle support.

* Be more verbose.
2023-08-24 18:45:10 +01:00
Laurent Mazare e21c686cdc
Fixes for clippy 1.72. (#587) 2023-08-24 17:46:17 +01:00
Laurent Mazare c265ac50fa
Add a function to write gguf files. (#585)
* Add a function to write gguf files.

* More GGUF file writing.

* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
Nicolas Patry a87c6f7652
Merge pull request #561 from patrickvonplaten/add_installation
Improve installation section and "get started"
2023-08-24 16:25:52 +02:00
Laurent Mazare afd965f77c
More non square testing (#582)
* Add more non square testing.

* More testing.
2023-08-24 13:01:04 +01:00
Lukas Kreussel d2f42ab086
Referenze implementations of `q2k` and `q3k` vec-dot functions (#580)
* add `q2k` vec-dot

* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
Laurent Mazare ca318a6ec7
Add to the cuda example a reproduction of the issue. (#579)
* Add to the cuda example a reproduction of the issue.

* Tweak.

* Add a test using non-square matrixes.

* Fix the conv2d kernel.

* Display the error.

* And tweak the comment.
2023-08-24 12:07:31 +01:00
Laurent Mazare dd64465899
Add a test for conv2d with padding + bugfix the random number generation on cuda. (#578)
* Add a test for conv2d with padding.

* Cosmetic changes.

* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
Laurent Mazare 79916c2edb
Use the hub weights for efficientnet. (#573) 2023-08-23 18:20:21 +01:00
Laurent Mazare 431051cc32
Add Efficientnet (#572)
* EfficientNet.

* Complete the efficientnet implementation.

* Improve group handling.

* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
Laurent Mazare eedd85ffa7
Move the imagenet specific bits to a separate file. (#571) 2023-08-23 16:42:09 +01:00
Laurent Mazare 7478dda255
Cosmetic tweaks. (#570) 2023-08-23 15:45:40 +01:00
Laurent Mazare 329f661d9b
Trace softmax (#568)
* Trace the softmax op.

* Inline the sum.

* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
Lukas Kreussel 075b505480
Mirror GGML's unit tests (#569)
* Add ggml unit tests

* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00