Commit Graph

631 Commits

Author SHA1 Message Date
Santiago Medina ace282e5c2
Add flag to run Moondream in f16 precision (#2015)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2

* Add flag to use f16

* Avoid breaking the quantized version on cuda.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-05 07:03:33 +02:00
Laurent Mazare c87381fc96
Use F16 for moondream on cuda. (#2013) 2024-04-04 23:30:10 +02:00
Laurent Mazare f48c07e242
Include topk sampling in the quantized example. (#2005)
* Include topk sampling in the quantized example.

* Also sample with top-k on the mistral side.
2024-04-04 09:27:54 +02:00
Santiago Medina d17b2cdad9
Match Moondream's latest release (#1997)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
2024-04-02 21:37:09 +02:00
Laurent Mazare be9c200cbb
Expose the t5 config fields + allow t5-large. (#1987) 2024-04-01 20:58:34 +02:00
Santiago Medina ea0d8d3753
Quantized moondream implementation and BOS token (#1980)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Pass bos token at the beginning of tensor.

* Quantize moondream.

* Forward with image bos token.

* Clippy.

* Use q4_0 quantization.

* Add pointers for sequence and tokens; Remove seq_len conditional
2024-04-01 19:37:54 +02:00
Laurent Mazare b20acd622c
Update for pyo3 0.21. (#1985)
* Update for pyo3 0.21.

* Also adapt the RL example.

* Fix for the pyo3-onnx bindings...

* Print details on failures.

* Revert pyi.
2024-04-01 17:07:02 +02:00
Laurent Mazare c7557b65dc
Switch the default to using the faster kernels. (#1978)
* Switch the default to using the faster kernels.

* Add the force-dmmv flag.
2024-04-01 10:00:11 +02:00
Laurent Mazare cd29c7ccd4
More ggml cuda kernels (#1977)
* Add more cuda kernels for quantized matmul.

* Add the vec-dot bits.

* Expose the quantized matmul-vec kernels.

* Also include the quantize-q8-1 kernel.

* Glue code for the q8-1 quantization.

* mm-vec product via q8-1 quantization.

* Add a test.

* Add a mm test.

* Get the test to return some sensible results.

* Also test dmmv.

* Fix the launch params.

* Allow for tweaking the force_dmmv parameter while it's experimental.
2024-04-01 00:15:48 +02:00
Laurent Mazare f9954b73ba
Add options to use local files + specify a custom repo or branch. (#1973) 2024-03-31 09:32:50 +02:00
Laurent Mazare eead1dcead
Clippy fix. (#1972) 2024-03-31 08:57:40 +02:00
Santiago Medina 92f81d2fcb
Add Moondream transformer implementation and example (#1970)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward
2024-03-31 08:54:56 +02:00
Laurent Mazare 3144150b8d
Move the tensor-tools binary in a separate crate. (#1969) 2024-03-30 15:49:37 +01:00
Laurent Mazare 8ad12a0e81
Add some examples using the MT5 variants. (#1963) 2024-03-29 18:09:29 +01:00
Laurent Mazare eb1b27abcd
Readme fix. (#1961) 2024-03-28 23:24:46 +01:00
Laurent Mazare 708e422456
Qwen MoE model. (#1960)
* Qwen MoE model.

* Add the MoE model to the example.

* Fix the scaling.

* Readme updates.

* Readme tweaks.
2024-03-28 23:10:57 +01:00
Laurent Mazare c5092f2c29
Add a couple t5 models. (#1958) 2024-03-28 17:58:06 +01:00
Tigran Zhampeissov b0340d72ec
CLIP model implementation with example (#1950)
* CLIP model implementation with example

* CLIP Implementation fixes, batch images

* CLIP model remove images from git

* CLIP model remove unnecessary use of batch_indices
2024-03-28 13:44:12 +01:00
Laurent Mazare e2b4829531
Support more mistral models. (#1927)
* Support more mistral models.

* Use the appropriate rope parameter.
2024-03-24 08:04:04 +01:00
Laurent Mazare a00e24d752
Improve the error message on overlong prompts. (#1908) 2024-03-21 21:08:07 +01:00
Sanchit Gandhi bb3ee48039
whisper readme (#1899) 2024-03-21 12:54:09 +01:00
Sanchit Gandhi 0c11e055be
support distil-large-v3 (#1898) 2024-03-21 11:46:49 +01:00
Laurent Mazare 18036c6ccb
Update the image crate + use the re-exported version. (#1893)
* Update the image crate + use the re-exported version.

* Update to using ab_glyph.
2024-03-21 10:56:41 +01:00
Laurent Mazare 455c42aa72
Avoid copying the data on squeeze and unsqueeze. (#1884)
* Avoid copying the data on squeeze and unsqueeze.

* Fix the quantized llama example.

* Unrelated fix for the quantized stable-lm example on cuda.

* Fix for mamba on cuda (unrelated to the PR).
2024-03-20 13:04:36 +01:00
Laurent Mazare f115895b9e
Apply rustfmt. (#1873) 2024-03-18 21:43:31 +01:00
Gabriel 6a966cf9e0
Add a DQN example to the reinforcement-learning section (#1872) 2024-03-18 21:22:53 +01:00
Laurent Mazare 58605252e8
Microphone support for the encodec example. (#1866) 2024-03-18 11:19:46 +01:00
Laurent Mazare d365ef32d9
Improve the encodec example: handle resampling. (#1865)
* Improve the encodec example: handle resampling.

* Play the audio directly.
2024-03-18 10:09:40 +01:00
Laurent Mazare a15f859ab4
Fix for the encodec example. (#1861) 2024-03-17 21:15:12 +01:00
Laurent Mazare 74bf6994b1
Move the image tensor to the appropriate device. (#1856) 2024-03-16 22:25:46 +01:00
Jani Monoses e1f9c3776d
StableLM-2 models were updated to use GPT-2 tokenization. (#1847) 2024-03-14 21:01:36 +01:00
Tyler Rockwood 3318fe30fb
Update gemma README (#1843)
* Update gemma README

* Fixit
2024-03-13 21:41:36 +01:00
Laurent Mazare 56c9d3ee7b
Fix the model path for rwkv. (#1825) 2024-03-09 11:21:48 +01:00
Laurent Mazare dd00482ea3
Quantized version of the metavoice model. (#1824)
* Quantized version of the metavoice model.

* Integrate the quantized version of metavoice.
2024-03-09 11:06:04 +01:00
Laurent Mazare 3440cec3a0
Fast CPU kernel for transposed 1d convolutions. (#1822)
* Fast CPU kernel for transposed 1d convolutions.

* Bugfix.
2024-03-08 22:43:07 +01:00
Niklas Hallqvist 0a3487a776
Add a --seed argument to the stable-diffusion example. (#1812)
* Add a --seed argument to the stable-diffusion example.

* Make the case when no seed is specified, that it will not be set, but use the engine's default.  This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.

---------

Co-authored-by: niklas <niklas@appli.se>
2024-03-08 08:17:36 +01:00
Laurent Mazare 8a99cf7dd2
Add a flag to select the dtype used in metavoice. (#1805) 2024-03-05 12:16:00 +01:00
Jiayu Liu 924ccae30c
Add an initial Segformer implementation (#1617)
* add segformer

* Make the id2label field optional.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-03-03 16:01:46 +01:00
Laurent Mazare 60dc72b96b
More metavoice tweaks. (#1796) 2024-03-03 15:05:25 +01:00
Laurent Mazare 20abb72fec
Normalize loudness of the generated audio (#1795)
* Normalize loudness of the generated audio.

* Lints.

* One more lint.

* Avoid running the bs1770 tests.

* Another attempt at discarding doc comments.

* Also normalize the loudness in the encodec example.
2024-03-03 14:00:42 +01:00
Laurent Mazare ca5d727ba2
Use the same padding in metavoice as in the python version. (#1794) 2024-03-03 12:04:48 +01:00
Laurent Mazare 09e0148cce
Tweaks to run metavoice on metal (#1792)
* Enable tanh + tweak conv-transpose.

* Run the encodec decoding on cpu.

* Clippy fixes.
2024-03-03 07:46:44 +01:00
Laurent Mazare de11623752
Metavoice position fix (#1791)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Fix the position index in metavoice.
2024-03-02 21:00:35 +01:00
Laurent Mazare 21f1d04976
Add the instruction finetuned gemma variants. (#1790) 2024-03-02 18:56:59 +01:00
Laurent Mazare 4fff5b51f5
Metavoice - first cut (#1717)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Add a warning.
2024-03-02 18:50:01 +01:00
Jack Shih 6980774a91
fix rwkv example eos token (#1785) 2024-03-01 10:22:28 +01:00
Laurent Mazare 64d4038e4f
Mention rwkv v6 in the readmes. (#1784) 2024-03-01 08:58:30 +01:00
Jani Monoses 979deaca07
EfficientVit (MSRA) model (#1783)
* Add EfficientVit (Microsoft Research Asia) model.

* Mention models in README
2024-03-01 08:53:52 +01:00
Jack Shih b485e4b6ee
add models of rwkv v6 and quantized rwkv v6 (#1781)
* add models of rwkv v6 and quantized rwkv v6

* fix ci clippy fail
2024-03-01 08:37:56 +01:00
Laurent Mazare 4fd00b8900
Add the StarCoder2 model. (#1779)
* Add the StarCoder2 model.

* Add the example code and get things to work.

* And also tweak the readme.
2024-02-28 21:02:41 +01:00