Commit Graph

296 Commits

Author SHA1 Message Date
Laurent Mazare aa53368aeb
Better control on the optional dequantization in QMatMul (#1049)
* Cosmetic change to the quantized whisper model.

* Fix the dequantization.

* Add the dequantize all variable.
2023-10-07 10:16:18 +01:00
Laurent Mazare d5f7267087
Add the stable-lm example. (#1046)
* Add the stable-lm example.

* Get stable-lm to generate some proper text.
2023-10-06 19:20:35 +01:00
Laurent Mazare b0442eff8a
Sketch the stable-lm model. (#1045) 2023-10-06 18:19:06 +01:00
Laurent Mazare 4631c48273
Remove some todos. (#1042) 2023-10-05 22:42:20 +01:00
Juarez Bochi f47bd9bab5
Delete invalid comment (#1038) 2023-10-05 19:28:08 +01:00
Laurent Mazare 089fc3b584
Improve the quantized whisper setup. (#1018)
* Improve the quantized whisper setup.

* Fix the config file paths.

* Use the standard matmul where possible.
2023-10-02 17:17:46 +01:00
Laurent Mazare e04c789230
Add a quantized variant of whisper (#1017)
* Add the quantized-whisper model.

* Quantized the whisper model.

* Adapt the whisper example to handle quantization.

* Add the quantized flag.

* Load the proper weights.
2023-10-02 14:59:53 +01:00
Laurent Mazare 096dee7073
Bump the version to 0.3.0. (#1014)
* Bump the version to 0.3.0.

* Changelog update.
2023-10-01 13:51:57 +01:00
Laurent Mazare deee7612da
Quantized version of mistral. (#1009)
* Quantized version of mistral.

* Integrate the quantized mistral variant.

* Use the quantized weight files.

* Tweak the quantization command.

* Fix the dtype when computing the rotary embeddings.

* Update the readme with the quantized version.

* Fix the decoding of the remaining tokens.
2023-09-30 18:25:47 +01:00
Laurent Mazare 4021272875
Use flash-attn for mistral. (#1004) 2023-09-30 12:15:10 +01:00
Laurent Mazare 6203ced495
Add negative prompts to segment-anything. (#1000) 2023-09-30 06:17:42 +01:00
Laurent Mazare d188d6a764
Fix the multiple points case for sam. (#998) 2023-09-29 22:39:43 +02:00
Laurent Mazare 53510ce427
Use a silu activation in mistral. (#991) 2023-09-29 07:06:54 +01:00
Laurent Mazare 23b3576c47
Add the sliding window. (#986) 2023-09-28 17:26:33 +01:00
Laurent Mazare 716ab2ccdc
Mistral gpu fix (#985)
* Add the mistral example.

* Use the two model files.

* Adjust the dtype.

* Tweak the weight paths.

* Remove the end of text token.

* Get the mistral model to generate some text.

* Fix when running on the gpu.

* More gpu fixes.
2023-09-28 16:38:13 +01:00
Laurent Mazare ada8851a23
Add the mistral example. (#984)
* Add the mistral example.

* Use the two model files.

* Adjust the dtype.

* Tweak the weight paths.

* Remove the end of text token.

* Get the mistral model to generate some text.
2023-09-28 16:19:18 +01:00
Laurent Mazare c05a348e36
Add the Mistral 7b model (#983)
* Start sketching the mistral 7b model.

* Add the kv cache.

* Add the decoder layer.

* Add the mistral model.

* Rotary embeddings.

* Add the attention mask.
2023-09-28 14:29:41 +01:00
Laurent Mazare ce0a4e3a85
Use the gelu-erf activation. (#969) 2023-09-26 22:30:21 +01:00
Laurent Mazare 1fcac4afed
Expose a function to clear the KV cache on mixformers. (#964) 2023-09-26 05:41:07 +01:00
Laurent Mazare a36d883254
Use a single flag for the point argument. (#958) 2023-09-25 12:53:24 +01:00
GeauxEric 7f2bbcf746
[segment-anything] Support multi-point as the prompt input (#945)
* [sam] Support multi-point prompts

* [segment-anything] Pass points by reference

* [segment-anything] Update example code and image

* Fix clippy lint.

---------

Co-authored-by: Yun Ding <yunding@nvidia.com>
Co-authored-by: laurent <laurent.mazare@gmail.com>
2023-09-25 12:14:10 +01:00
Laurent Mazare 0007ae9c11
Add the quantized mixformer model. (#953)
* Add the quantized mixformer model.

* Add the quantized option in the phi example.
2023-09-24 15:03:48 +01:00
Laurent Mazare e15862cfdb
Shared the quantized var-builder code. (#952)
* Shared the quantized var-builder code.

* Fix compilation.
2023-09-24 12:55:07 +01:00
Laurent Mazare bb3471ea31
Adapt more examples to the updated safetensor api. (#947)
* Simplify the safetensor usage.

* Convert more examples.

* Move more examples.

* Adapt stable-diffusion.
2023-09-23 21:26:03 +01:00
Laurent Mazare 7582937a32
Add the causal mask in mixformer. (#937) 2023-09-23 09:50:26 +01:00
Laurent Mazare b54acfa3d0
Tracing for the phi model (#936)
* Add some tracing bits to mixformers.

* Add the missing file.

* Add the conv2d layer to with-tracing.

* Improve the tracing usage.
2023-09-23 09:19:34 +01:00
Laurent Mazare df6f5240ba
Complete the mixformer implementation. (#930)
* Complete the mixformers implementation.

* Tweak the attention.

* Add the phi-1.5 example.

* Improve the phi example.

* Bugfix.

* Get the phi example to work.
2023-09-22 20:03:16 +01:00
Laurent Mazare a46b1b4657
Mixformer (#929)
* Sketch the mixformer model.

* More modeling code.

* More mixformers.

* MixFormer creation.

* More mixformers.
2023-09-22 16:17:14 +01:00
Radamés Ajna 19e52e5007
T5 Wasm (#918)
* init t5 wasm model

* split workers for each model

* clean up

* add some ui

* readme

* index

* typo

* remove cache param, clear_kv_cache

* add max_length as param

* add model tasks option to ui

* add method to load quantized gguf from buffer

* Add quantized wasm module

* add quantized models to UI, dynamic import wasms

* link to quantized

* fix copy

* fix ModelEncoder

* fix README.md
2023-09-22 15:31:10 +01:00
Laurent Mazare 3b557765e8
T5 quantized example (#922)
* Load gguf files for the quantized t5.

* Add the quantized t5 example.

* Allow for loading local files.

* Add some support for quantizing safetensor files.

* Transpose before quantizing.

* Quantized t5.

* Retrieve the weights from the hub.
2023-09-21 12:33:15 +01:00
Laurent Mazare 2619c4307f
Add a quantized version of the t5 model. (#921) 2023-09-21 11:13:39 +01:00
Laurent Mazare c89b82b2d4
Add a clear cache function to the t5 model. (#919) 2023-09-21 09:01:06 +01:00
Laurent Mazare ab1d40ea97
Add more t5 tracing. (#915) 2023-09-20 20:20:54 +01:00
Laurent Mazare 3a0d3e05df
Add more t5 tracing. (#914)
* Add more t5 tracing.

* Rever the sm change.
2023-09-20 16:37:51 +01:00
Laurent Mazare 9b24d89d2d
Tracing mode for T5. (#913)
* Tracing mode for T5.

* Tracing for the linear layer.
2023-09-20 15:03:35 +01:00
Laurent Mazare fb1c2ac535
Add flash-attn support. (#912)
* Add flash-attn support.

* Add the use-flash-attn flag.

* Re-enable flash-attn.
2023-09-20 14:07:55 +01:00
Laurent Mazare f685b2231c
Add some missing biases. (#908) 2023-09-20 10:14:51 +01:00
Juarez Bochi 05626ef492
Flan T5: Read lm_head when word embeddings are not tied (#903)
* Read lm_head when word embeddings are not tied

* Fix formatting

* Address comments
2023-09-19 22:36:47 +01:00
Laurent Mazare 67a486d18d
Line-up the wuerstchen model with the python implementation. (#901)
* Line-up the wuerstchen model with the python implementation.

* Missing cos.

* Fix the picture denormalization.
2023-09-19 21:59:44 +01:00
Juarez Bochi 8696f64bae
Fix T5 kv cache (#899)
* Fix T5 kv cache

* Add argument for decoder prompt

* Fix range
2023-09-19 20:36:15 +01:00
Laurent Mazare 4f91c8e109
Improve the error message on shape mismatch for cat. (#897)
* Improve the error message on shape mismatch for cat.

* Cosmetic tweak.
2023-09-19 15:09:47 +01:00
Laurent Mazare 06e46d7c3b
Only use classifier free guidance for the prior. (#896)
* Only use classifier free guidance for the prior.

* Add another specific layer-norm structure.

* Tweaks.

* Fix the latent shape.

* Print the prior shape.

* More shape fixes.

* Remove some debugging continue.
2023-09-19 14:13:05 +01:00
Laurent Mazare 92db8cecd3
Specialized attention module for Wuerstchen. (#890)
* Specialized attention module for Wuerstchen.

* Reshaping ops.

* Attention processor.

* Finish the forward pass.

* Hook the new attention processor.

* Get the prior forward pass to work.

* Make it contiguous.
2023-09-18 21:16:09 +01:00
Laurent Mazare 82a98f6da0
Prior denoising. (#889) 2023-09-18 16:51:38 +01:00
Laurent Mazare 5082954c52
Fix the W clip embeddings. (#887)
* Fix the W clip embeddings.

* Add the specialized ddpm scheduler.
2023-09-18 14:50:14 +01:00
Laurent Mazare 7dd8e12472
Bump the crate versions to v0.2.3. (#886)
* Bump the crate version.

* Also update the python bindings.
2023-09-18 12:14:03 +01:00
Laurent Mazare c2b866172a
More Wuerstchen fixes. (#882)
* More Weurstchen fixes.

* More shape fixes.

* Add more of the prior specific bits.

* Broadcast add.

* Fix the clip config.

* Add some masking options to the clip model.
2023-09-17 22:08:11 +01:00
Laurent Mazare 06cc329e71
Remove the parameters for the Wuerstchen layer-norm. (#879)
* Remove the parameters for the Wuerstchen layer-norm.

* Fixes.

* More fixes (including conv-transpose2d.

* More fixes.

* Again more fixes.
2023-09-17 15:59:27 +01:00
Laurent Mazare 5f83c13f17
Add the DDPM scheduler. (#877)
* Add the DDPM scheduler.

* Minor tweaks.
2023-09-17 15:03:01 +01:00
Laurent Mazare db3e9dae04
Wuerstchen main (#876)
* Wuerstchen main.

* More of the wuerstchen cli example.

* Paella creation.

* Build the prior model.

* Fix the weight file names.
2023-09-17 12:46:38 +01:00
Laurent Mazare 7f65af1f0d
Avoid re-encoding the input in the T5 example. (#875) 2023-09-17 10:25:54 +01:00
Laurent Mazare 1a276b5da7
Add a KV cache to T5. (#873)
* Add a KV cache to T5.

* Suggest using release mode.

* Use the kv cache in decoding.

* Add a comment.
2023-09-17 08:00:45 +01:00
Juarez Bochi 3e49f8fce5
Implement T5 decoding (#864)
* Load t5 decoder

* Run enc, dec, and lm head, but no cross attn

* Cross-attention over key_value_states

* New arg for decoder input ids

* Add mask, don't forward position biases through decoder

* Update t5 examples

* Clippy + rustfmt
2023-09-15 22:05:12 +02:00
Laurent Mazare c2007ac88f
W fixes. (#862) 2023-09-15 15:11:11 +01:00
Laurent Mazare 30be5b6660
Replication pad (#861)
* Add the embed mapper convolutions.

* Add the replication pad layer.

* Use the replication-pad op.

* Tweak a todo.
2023-09-15 14:06:21 +01:00
Laurent Mazare 107d3d9530
Add the embed mapper convolutions. (#860) 2023-09-15 11:38:38 +02:00
Laurent Mazare 2746f2c4be
DiffNeXt/unet (#859)
* DiffNeXt/unet

* Start adding the vae.

* VAE residual block.

* VAE forward pass.

* Add pixel shuffling.

* Actually use pixel shuffling.
2023-09-15 10:14:02 +01:00
Laurent Mazare 130fe5a087
Add the upblocks. (#853) 2023-09-14 22:24:56 +01:00
Laurent Mazare 91ec546feb
More DiffNeXt. (#847)
* More DiffNeXt.

* Down blocks.
2023-09-14 22:16:31 +02:00
Laurent Mazare 0a647875ec
Use softmax-last-dim in the quantized example. (#848) 2023-09-14 17:29:24 +01:00
Laurent Mazare a0c6d5548c
Add the attention block. (#846)
* Add the attention block.

* Add more to clipnext.
2023-09-14 15:40:09 +01:00
Laurent Mazare 286f01db14
Start adding the Wuerstchen diffusion pipeline (#843)
* Wuerstchen common bits.

* Add the prior layer.

* Start adding diffnext.
2023-09-14 10:56:07 +01:00
Juarez Bochi 49d3f7f708
Add support to flan-t5 (#840) 2023-09-13 19:27:20 +02:00
Laurent Mazare 3e94324012
Add some sentence similarity part to the t5 example. (#835)
* Add some sentence similarity part to the t5 example.

* Clippy fix.
2023-09-13 10:44:02 +01:00
Laurent Mazare e4553fb355
T5 tweaks (#831)
* Use default values rather than options.

* Avoid exposing the device field.

* More tweaks.
2023-09-13 07:37:04 +01:00
Laurent Mazare d801e1d564
Clippy fix. (#830) 2023-09-13 07:16:20 +01:00
Juarez Bochi 9daa6dbe87
Extract T5 module and add main function to use it (#829)
* Extract t5 out of musicgen

* Add main for t5 module
2023-09-13 07:14:05 +01:00
Juarez Bochi 805bf9ffa7
Implement top_p / nucleus sampling (#819)
* Implement top_p / nucleus sampling

* Update changelog

* rustfmt

* Add tests

* Fix clippy warning

* Fix another clippy error
2023-09-12 18:10:16 +02:00
Laurent Mazare 2257f4d475
Bump the crate version + update the changelog. (#822) 2023-09-12 06:39:24 +01:00
Laurent Mazare c5a058b169
Use the module trait in stable-diffusion. (#817) 2023-09-11 20:40:07 +01:00
Laurent Mazare d7b9fec849
Move the stable-diffusion modeling code so that it's easier to re-use. (#812) 2023-09-11 11:45:57 +01:00
Laurent Mazare 84ee870efd
Use softmax-last-dim in whisper. (#810) 2023-09-11 11:05:05 +01:00
Laurent Mazare 90e077e409
Return the low res mask in the wasm segment-anything module. (#798)
* Return the low res mask.

* Add some validations.
2023-09-10 13:03:02 +01:00
Laurent Mazare 584171cae1
Add a wasm module for the segment anything example. (#797) 2023-09-10 12:29:37 +01:00
Laurent Mazare 35f72514f5
Move more models to candle-transformers (#796)
* Move dinov2.

* Move efficientnet.

* Move the quantized llama model.

* Move segment-anything.
2023-09-10 10:20:18 +01:00
Laurent Mazare d3f05eae8c
Move some models to candle-transformers so that it's easier to re-use. (#794)
* Move some models to candle-transformers so that they can be shared.

* Also move falcon.

* Move Llama.

* Move whisper (partial).
2023-09-10 09:40:27 +01:00
Laurent Mazare 618f4e4c78
Add some documentation. (#673)
* Add some documentation.

* Bump the crate version.
2023-08-30 11:54:00 +01:00
Laurent Mazare a3f97c143d
Bump the crate version + update CHANGELOG. (#628) 2023-08-27 18:17:11 +01:00
Laurent Mazare 6e485f2deb
Add some optional repeat penalty. (#623)
* Add some optional repeat penalty.

* Add the missing files.
2023-08-27 10:48:45 +01:00
Laurent Mazare aba1e90797
Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare 3507e14c0c
Yolo v8 fixes (#542)
* Fixes for the yolo-v8 layout.

* Bugfixes.

* Another silly bugfix.

* Remove the hf-hub dependency.

* Remove the transformers dependency.
2023-08-21 21:05:40 +01:00
Laurent Mazare 912561614f
Better handling of zero temperatures. (#532) 2023-08-21 07:51:46 +01:00
Laurent Mazare a8f61e66cc
Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
Laurent Mazare 531f23b4d0
Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare b278834267
Support the Accelerate BLAS on macOS. (#325)
* Add the accelerate feature.

* Ffi tweaks.
2023-08-05 17:25:24 +01:00
Laurent Mazare 4fe8a02f88
Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
Laurent Mazare 03a421f714
Add some missing readme files. (#304) 2023-08-02 10:57:12 +01:00
Laurent Mazare d38943aadc
Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare 51e51da896
Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
Laurent Mazare 3eb2bc6d07
Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
Laurent Mazare c34f932319
Fix the mkl build. (#204)
* Fix the mkl build.

* Fix the build properly.
2023-07-19 19:41:11 +01:00
Nicolas Patry 439321745a Removing `candle-hub` internal to extract into `hf-hub` standalone. 2023-07-19 15:04:38 +02:00
Laurent Mazare b8abe2bb4b
Factorize the tokenizers version in the workspace cargo def. (#186) 2023-07-18 06:48:13 +01:00
Laurent Mazare 104f89df31
Centralize the dependency versions and inherit them. (#177) 2023-07-16 07:47:17 +01:00
Nicolas Patry 4ed56d7861 Removing cuda default.
Seems very important for a lot of exploring users usually on laptop
without GPUs.

Adding more README instructions in a follow up.
2023-07-14 16:52:15 +02:00
Laurent Mazare ba35d895e7
Sketch the candle-transformers crate. (#147)
* Sketch the candle-transformers crate.

* Format the empty files.
2023-07-12 13:49:31 +01:00