Laurent Mazare
4c338b0cd9
VarBuilder cleanup ( #627 )
...
* VarBuilder cleanup.
* Implement the basic varbuilders.
* Add the sharded code.
* Proper support for tensor sharding.
2023-08-27 18:03:26 +01:00
Laurent Mazare
be471d50ab
Llama quantization. ( #625 )
2023-08-27 14:08:15 +01:00
Laurent Mazare
7151f2cf63
Add the quantize command. ( #624 )
...
* Add the quantize command.
* Bugfix for writing gguf files.
* And add a comment.
2023-08-27 11:35:19 +01:00
Laurent Mazare
6e485f2deb
Add some optional repeat penalty. ( #623 )
...
* Add some optional repeat penalty.
* Add the missing files.
2023-08-27 10:48:45 +01:00
Laurent Mazare
5320aa6b7d
Move the test-utils bits to a shared place. ( #619 )
2023-08-27 09:42:22 +01:00
Laurent Mazare
a8b39dd7b7
Fix for q5_1 quantization. ( #617 )
...
* Fix for q5_1 quantization.
* Fix some typos.
2023-08-27 08:31:18 +01:00
Laurent Mazare
fa0d75b18d
Quantization tests + fix some issues. ( #616 )
2023-08-27 08:17:38 +01:00
Laurent Mazare
28658054ff
More missing quantized bits. ( #615 )
...
* Q4_1 support.
* Add Q5_1 quantization.
* Tweak.
2023-08-27 07:52:26 +01:00
Laurent Mazare
ab36a7f3e3
Fix for when f16c is not available. ( #614 )
2023-08-27 07:19:52 +01:00
Laurent Mazare
f704e39761
Missing quants ops ( #611 )
...
* Another transmute tweak.
* Changelog tweak.
* Add some missing quantized ops.
2023-08-26 20:09:04 +01:00
Laurent Mazare
fdf15f0e05
Another transmute tweak. ( #610 )
...
* Another transmute tweak.
* Changelog tweak.
2023-08-26 13:00:24 +01:00
Laurent Mazare
06b37ea7ad
Avoid using tmp values. ( #609 )
2023-08-26 12:28:28 +01:00
Lukas Kreussel
c72eb3d75b
Add reference implementation for `q4k` and `q5k` ( #586 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
* `q4k` vec-dot
* `q5k` vec-dot
* Validate against GGML unit test results.
* Remove some more `transmutes`
2023-08-26 12:07:54 +01:00
Radamés Ajna
864227edbf
[WIP] Improve Yolo WASM UI example ( #591 )
...
* return detections with classes names
* ignore .DS_Store
* example how to load wasm module
* add param to set model size
* add param for model size
* accept iou and confidence threshold on run
* conf and iou thresholds
* clamp only
* remove images from branch
* a couple of renamings, add readme with instructions
* final design
* minor font + border update
2023-08-26 11:40:41 +01:00
Nicolas Patry
b23b347b35
Merge pull request #601 from huggingface/repair_bf16_f16_cast
...
Repairing cast bf16/f16
2023-08-26 12:34:41 +02:00
Patrick von Platen
71518caeee
Align tensor device print more with PyTorch ( #590 )
...
* Improve tensor print
* Use CudaDevice only if enabled with cuda feature
* run rust fmt
* up
* improve
* rustfmt
2023-08-26 11:20:22 +01:00
Laurent Mazare
6559eae72c
Avoid some transmutes. ( #607 )
2023-08-25 18:21:37 +01:00
Laurent Mazare
46eb225ba5
Add some missing entries to the changelog. ( #606 )
2023-08-25 18:01:38 +01:00
Nicolas Patry
aa67e5107d
Merge pull request #600 from huggingface/codellama_gpu_support
...
Adding support for codellama in examples.
2023-08-25 18:25:26 +02:00
Nicolas Patry
c105550405
s/panic/bail/
2023-08-25 18:05:07 +02:00
Laurent Mazare
ca6c050b04
Cleanup the pose reporting code. ( #605 )
2023-08-25 16:49:21 +01:00
Laurent Mazare
9c8d6dbc2a
Neon intrinsics for the q8_0 vecdot. ( #604 )
...
* Neon intrinsics for the q8_0 vecdot.
* Get the tests to run with accelerate (with some numerical error failures).
2023-08-25 14:42:18 +01:00
Laurent Mazare
0afbc435df
Add some configurable legend for yolo detection. ( #603 )
...
* Add some configurable legend for yolo detection.
* Clippyness.
2023-08-25 13:50:31 +01:00
Nicolas Patry
d4e75d5825
Let's keep the dirty code on its own.
2023-08-25 12:01:58 +00:00
Nicolas Patry
be371e827c
Intermediary float cast is necessary for cuda 11.8
2023-08-25 11:54:30 +00:00
Laurent Mazare
97909e5068
Move the yolo model bits in a separate file. ( #602 )
...
* Move the yolo model bits in a separate file.
* Improve the drawing.
* Bugfix.
2023-08-25 12:47:55 +01:00
Nicolas Patry
1c1e34735e
`static_cast` ?
2023-08-25 11:40:36 +00:00
Nicolas Patry
db8bab8b7a
Different casting ?
2023-08-25 10:49:22 +00:00
Nicolas Patry
bc131b402b
Repairing cast bf16/f16
2023-08-25 10:38:19 +00:00
Laurent Mazare
8bc5fffa45
More support for pose estimation in yolo-v8. ( #599 )
...
* More support for pose estimation in yolo-v8.
* Support both object detection and pose-estimation in the yolo-v8 example.
2023-08-25 11:21:11 +01:00
Nicolas Patry
4826a4212e
Adding support for codellama in examples.
...
Codellama requires bf16 for now (error to convert from bf16 to f16).
Multiprocess demo not functional for it because flash-attn only supports
f16 for now.
2023-08-25 09:56:11 +00:00
Laurent Mazare
afc10a3232
AVX version for the q8-0 multiplications. ( #598 )
2023-08-25 10:14:49 +01:00
Laurent Mazare
d728e646c2
Use resolver 2 explicitely. ( #597 )
2023-08-25 09:35:40 +01:00
Laurent Mazare
c093b03d51
Generic implementation of vecdot for q80. ( #596 )
...
* Generic implementation of vecdot for q80.
* Add support for code-llama 7b.
* Support more code-llama.
2023-08-25 09:04:05 +01:00
Laurent Mazare
d8ba0452dc
Fail on bf16. ( #594 )
2023-08-25 06:10:38 +01:00
Laurent Mazare
189442a0fa
Add the pose estimation head for yolo. ( #589 )
...
* Add the pose estimation head for yolo.
* Properly handle the added position dimensions.
* Integrate the pose estimation head in the forward pass.
* Renaming.
* Fix for pose estimation.
2023-08-24 22:12:34 +01:00
Laurent Mazare
2cde0cb74b
More pickle support. ( #588 )
...
* More pickle support.
* Be more verbose.
2023-08-24 18:45:10 +01:00
Laurent Mazare
e21c686cdc
Fixes for clippy 1.72. ( #587 )
2023-08-24 17:46:17 +01:00
Laurent Mazare
c265ac50fa
Add a function to write gguf files. ( #585 )
...
* Add a function to write gguf files.
* More GGUF file writing.
* Write the tensor data in GGUF files.
2023-08-24 17:03:06 +01:00
Nicolas Patry
a87c6f7652
Merge pull request #561 from patrickvonplaten/add_installation
...
Improve installation section and "get started"
2023-08-24 16:25:52 +02:00
Laurent Mazare
afd965f77c
More non square testing ( #582 )
...
* Add more non square testing.
* More testing.
2023-08-24 13:01:04 +01:00
Lukas Kreussel
d2f42ab086
Referenze implementations of `q2k` and `q3k` vec-dot functions ( #580 )
...
* add `q2k` vec-dot
* `q3k` vec-dot + quantization bugfix
2023-08-24 12:35:54 +01:00
Laurent Mazare
ca318a6ec7
Add to the cuda example a reproduction of the issue. ( #579 )
...
* Add to the cuda example a reproduction of the issue.
* Tweak.
* Add a test using non-square matrixes.
* Fix the conv2d kernel.
* Display the error.
* And tweak the comment.
2023-08-24 12:07:31 +01:00
Laurent Mazare
dd64465899
Add a test for conv2d with padding + bugfix the random number generation on cuda. ( #578 )
...
* Add a test for conv2d with padding.
* Cosmetic changes.
* Bugfix the rand function on the cuda backend.
2023-08-24 10:16:37 +01:00
Laurent Mazare
79916c2edb
Use the hub weights for efficientnet. ( #573 )
2023-08-23 18:20:21 +01:00
Laurent Mazare
431051cc32
Add Efficientnet ( #572 )
...
* EfficientNet.
* Complete the efficientnet implementation.
* Improve group handling.
* Get the efficientnet to work.
2023-08-23 18:02:58 +01:00
Laurent Mazare
eedd85ffa7
Move the imagenet specific bits to a separate file. ( #571 )
2023-08-23 16:42:09 +01:00
Laurent Mazare
7478dda255
Cosmetic tweaks. ( #570 )
2023-08-23 15:45:40 +01:00
Laurent Mazare
329f661d9b
Trace softmax ( #568 )
...
* Trace the softmax op.
* Inline the sum.
* Add min/max vec operations.
2023-08-23 15:25:50 +01:00
Lukas Kreussel
075b505480
Mirror GGML's unit tests ( #569 )
...
* Add ggml unit tests
* simplify random matmul test for other test cases
2023-08-23 15:25:17 +01:00