Commit Graph

1851 Commits

Author SHA1 Message Date
laurent 101a4c8389 Moondream first bits. 2024-03-17 17:49:56 +01:00
Laurent Mazare ce9fbc3682
Optimize the cat operation on contiguous tensors (#1855)
* Add a specialized kernel for copy2d.

* Move the cat operations.

* Avoid transpositions in cat.

* Bugfix.

* Bugfix for the cuda kernel.

* Add a benchmark.

* Add more testing.

* Test fix.

* Faster kernel.

* Add the missing kernel.

* Tweak the test.

* Add a metal kernel.

* Fix for the metal kernel.

* Get the tests to pass on metal.

* Also use this opportunity to fix the metal kernel for ELU.

* Add some bf16 kernels.

* Clippy fixes.
2024-03-17 10:49:13 +01:00
Thomas Santerre db8b24ae92
Add support for index u8/i64 and input f16/bf16 scatter-add on metal (#1849)
* add support and tests for scatter add on metal

* add support for all datatypes
2024-03-17 08:09:43 +01:00
Laurent Mazare 74bf6994b1
Move the image tensor to the appropriate device. (#1856) 2024-03-16 22:25:46 +01:00
Laurent Mazare cdc4c172c4
Implement the error trait for DTypeParseError. (#1852) 2024-03-15 08:37:27 +01:00
Jani Monoses e1f9c3776d
StableLM-2 models were updated to use GPT-2 tokenization. (#1847) 2024-03-14 21:01:36 +01:00
Tyler Rockwood 3318fe30fb
Update gemma README (#1843)
* Update gemma README

* Fixit
2024-03-13 21:41:36 +01:00
Thomas Santerre 2bb9c683b9
Update README.md (#1840)
Adds the candle-einops to the readme as an external resource
2024-03-13 14:36:25 +01:00
Laurent Mazare ff03fd3fb3
Expose some helper functions to create quantized models. (#1837) 2024-03-12 11:30:24 +01:00
Laurent Mazare df5f69444e
Properly handle the batch dimension in cuda quantized matmul. (#1832) 2024-03-10 20:23:43 +01:00
Laurent Mazare 0c5eecbc0f
Add some tracing to metavoice. (#1826) 2024-03-09 12:24:11 +01:00
Laurent Mazare 56c9d3ee7b
Fix the model path for rwkv. (#1825) 2024-03-09 11:21:48 +01:00
Laurent Mazare dd00482ea3
Quantized version of the metavoice model. (#1824)
* Quantized version of the metavoice model.

* Integrate the quantized version of metavoice.
2024-03-09 11:06:04 +01:00
Laurent Mazare 936f6a4840
Fix dequantization. (#1823) 2024-03-08 23:12:13 +01:00
Laurent Mazare 3440cec3a0
Fast CPU kernel for transposed 1d convolutions. (#1822)
* Fast CPU kernel for transposed 1d convolutions.

* Bugfix.
2024-03-08 22:43:07 +01:00
Laurent Mazare e7fc1daa21
Bump the crate versions to 0.4.2. (#1821) 2024-03-08 22:01:51 +01:00
Niklas Hallqvist be5b68cd0b
Metal random-generation bug fixes (#1811)
* use_resource API misunderstood. It is not additive. Several usages must be bit-ORed together.

* The seeding was incorrect and used the address instead of the value of the passed in seed.

* Add a check that likely exhibits failure to update the seed between generation of random tensors.

* Buffer overrun, the length given to the std::ptr::copy call was in bytes, and not 32-bit units.

* By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine.
Use device.set_seed if determinism is warranted.

* Revert "By default seed the RNG with a time-based value, so that different runs may produce different output, just like the CPU engine. Use device.set_seed if determinism is warranted."

This reverts commit d7302de9

Discussion in https://github.com/huggingface/candle/pull/1811#issuecomment-1983079119

* The Metal random kernel failed to set element N/2 of tensors with N elements, N being even.  The reason was that all threads but thread 0 all created 2 random samples, but thread 0 only one, i.e. an odd number.  In order to produce an even number of samples, the early termination of thread 0 should only everr occur for odd sized tensors.

* Add a test catching any deterministic tensor element in rand and randn output.

---------

Co-authored-by: niklas <niklas@appli.se>
Co-authored-by: Ivar Flakstad <69173633+ivarflakstad@users.noreply.github.com>
2024-03-08 16:11:50 +01:00
Laurent Mazare ea984d0421
Expose more printer options. (#1817) 2024-03-08 15:04:18 +01:00
Laurent Mazare 9634583781
Expose a couple layout methods. (#1816) 2024-03-08 10:52:22 +01:00
Kirpal Grewal 758366160e
add clone to candle dropout (#1814) 2024-03-08 08:18:01 +01:00
Niklas Hallqvist 0a3487a776
Add a --seed argument to the stable-diffusion example. (#1812)
* Add a --seed argument to the stable-diffusion example.

* Make the case when no seed is specified, that it will not be set, but use the engine's default.  This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.

---------

Co-authored-by: niklas <niklas@appli.se>
2024-03-08 08:17:36 +01:00
ivarflakstad 0c09d10f32
Improve metal buffer usage (#1807)
* Improve metal buffer usage

* Clone cpu storage when loading to reduce wait_until_complete calls
* Use powers of two for buffer sizes so reuse is more likely.
* Select best available buffer by size.
* Add count to MetalStorage -> can use buffer with different size

Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>

* Simplify new buffer creation without blit copy. Revert &[] -> Vec

* Add documentation on newBufferWithBytes safety / synchronization

* Drop unused buffers after command buffer is done syncing.

---------

Co-authored-by: Chris Fleetwood <christopher.fleetwood@huggingface.co>
2024-03-07 09:42:34 +01:00
Laurent Mazare 8a99cf7dd2
Add a flag to select the dtype used in metavoice. (#1805) 2024-03-05 12:16:00 +01:00
Laurent Mazare bd9ab9bc04
Add a cuda kernel for dequantizing q8_0. (#1804) 2024-03-05 09:50:37 +01:00
Laurent Mazare 8cc0a183ba
Speaker embeddings computation for metavoice. (#1800)
* Speaker embeddings computation for metavoice.

* Compute the speaker embeddings.
2024-03-04 14:13:01 +01:00
Laurent Mazare 6530932285
Add the new models to the main readme. (#1797) 2024-03-03 16:25:14 +01:00
Jiayu Liu 924ccae30c
Add an initial Segformer implementation (#1617)
* add segformer

* Make the id2label field optional.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-03-03 16:01:46 +01:00
Laurent Mazare 60dc72b96b
More metavoice tweaks. (#1796) 2024-03-03 15:05:25 +01:00
Laurent Mazare 20abb72fec
Normalize loudness of the generated audio (#1795)
* Normalize loudness of the generated audio.

* Lints.

* One more lint.

* Avoid running the bs1770 tests.

* Another attempt at discarding doc comments.

* Also normalize the loudness in the encodec example.
2024-03-03 14:00:42 +01:00
Laurent Mazare ca5d727ba2
Use the same padding in metavoice as in the python version. (#1794) 2024-03-03 12:04:48 +01:00
Laurent Mazare 09e0148cce
Tweaks to run metavoice on metal (#1792)
* Enable tanh + tweak conv-transpose.

* Run the encodec decoding on cpu.

* Clippy fixes.
2024-03-03 07:46:44 +01:00
Laurent Mazare de11623752
Metavoice position fix (#1791)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Fix the position index in metavoice.
2024-03-02 21:00:35 +01:00
Laurent Mazare 21f1d04976
Add the instruction finetuned gemma variants. (#1790) 2024-03-02 18:56:59 +01:00
Laurent Mazare 4fff5b51f5
Metavoice - first cut (#1717)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Add a warning.
2024-03-02 18:50:01 +01:00
Laurent Mazare 314630638d
Rustfmt fix. (#1788) 2024-03-02 10:35:07 +01:00
Frkri 3e3def4134
Update StableLM config (#1787) 2024-03-02 09:56:57 +01:00
Jack Shih 6980774a91
fix rwkv example eos token (#1785) 2024-03-01 10:22:28 +01:00
Laurent Mazare 64d4038e4f
Mention rwkv v6 in the readmes. (#1784) 2024-03-01 08:58:30 +01:00
Jani Monoses 979deaca07
EfficientVit (MSRA) model (#1783)
* Add EfficientVit (Microsoft Research Asia) model.

* Mention models in README
2024-03-01 08:53:52 +01:00
Jack Shih b485e4b6ee
add models of rwkv v6 and quantized rwkv v6 (#1781)
* add models of rwkv v6 and quantized rwkv v6

* fix ci clippy fail
2024-03-01 08:37:56 +01:00
laurent 2c95b7394a Handle Q5_0 and Q5_1 quants in cuda. 2024-02-29 10:54:01 +01:00
Laurent Mazare 4fd00b8900
Add the StarCoder2 model. (#1779)
* Add the StarCoder2 model.

* Add the example code and get things to work.

* And also tweak the readme.
2024-02-28 21:02:41 +01:00
Laurent Mazare 57267cd536
Add a flag to force running the quantized model on CPUs. (#1778)
* Add a flag to force running the quantized model on CPUs.

* Add encodec to the readme.
2024-02-28 14:58:42 +01:00
Laurent Mazare 60ee5cfd4d
Support more modes in the encodec example. (#1777)
* Support more modes in the encodec example.

* Remove the old encodec model from the musicgen bits.
2024-02-28 09:22:33 +01:00
Laurent Mazare 56e44aabe3
Make some dependencies optional in the examples. (#1776) 2024-02-28 07:17:03 +01:00
Laurent Mazare d0aca6c3c6
Encodec encoding demo. (#1775) 2024-02-28 06:49:03 +01:00
Laurent Mazare 15e8644149
Apply dilations in the encodec model. (#1772)
* Apply dilations in the encodec model.

* Add some encoding bits.
2024-02-27 23:26:35 +01:00
Laurent Mazare 0c49e95dfb
Encodec model. (#1771)
* Encodec model.

* Fixes.

* Add the padding functions.

* Get the LSTM bit to work.

* Get the encodec model to generate some tokens (decoder only for now).

* Minor tweak.

* Minor tweak.
2024-02-27 22:59:40 +01:00
Laurent Mazare 205767f9de
Avoid tensor copying in the quantized example. (#1770) 2024-02-27 20:32:30 +01:00
Laurent Mazare 5e526abc8c
Bump the version number to 0.4.1. (#1768)
* Fix the block size for some cuda kernels.

* Bump the version number to 0.4.1.
2024-02-27 14:19:59 +01:00