Commit Graph

744 Commits

Author SHA1 Message Date
Laurent Mazare c97d639fa0
Multiprocess/multi-GPU support for llama 3. (#2092)
* Multiprocess/multi-GPU support for llama 3.

* Modernize the mp example a bit.
2024-04-20 12:49:21 +02:00
Laurent Mazare 9c532aef47
Also enable llama-v3 8b instruct. (#2088) 2024-04-19 08:50:06 +02:00
Thomas Santerre f7a6468238
Add support for llama3 on the quantized example (#2086)
* add support for l3b, new tokenizer

* add todo

* Add todo and use k_s model

* Use the official tokenizers.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-18 22:52:00 +02:00
Laurent Mazare e6ee7ba4d4
Llama v3. (#2085)
* Llama v3.

* Tweak the default params + handle special tokens.

* Small tweak.
2024-04-18 22:19:54 +02:00
NorilskMajor 4d14777673
Utilize batches in Stable Diffusion (#2071)
* Utilize batches in Stable Diffusion that were already there, but unutilized.

Also refactor out the `save_image` function.

* Clippy + cosmetic fixes.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-16 06:49:04 +02:00
Harry Stern c119600d6e
Move image tensor to device in trocr example (#2063)
Signed-off-by: Harry Stern <harry@harrystern.net>
2024-04-15 06:50:32 +02:00
Laurent Mazare 50e49ecc5f
Add a quantized version of recurrent-gemma. (#2054)
* Add a quantized version of recurrent-gemma.

* Share the rglru part.

* Get the quantized gemma model to work.
2024-04-13 20:07:01 +02:00
Laurent Mazare 26cbbf8d84
Mandatory topk sampling for recurrent-gemma. (#2051) 2024-04-13 10:31:39 +02:00
Laurent Mazare 2bf413caa3
Add the recurrent-gemma model. (#2039)
* Start adding the recurrent-gemma model.

* More griffin.

* Add the example + get the weights to load from the HF version.

* More inference code.

* Rope + kv-cache on the attention side.

* Add to the inference code.

* Add more to the recurrent gemma inference.

* Get some first inference to run.

* Add the softcap on logits.

* Fixes.

* Use partial rotary embeddings.

* Get inference to work.

* Add a comment.

* And add a readme.
2024-04-13 00:05:21 +02:00
Laurent Mazare a0460cd2b1
Add the code-gemma models. (#2038)
* Add the code-gemma models.

* Tweak to the gemma config.
2024-04-10 21:19:21 +02:00
Laurent Mazare b81ecf712d
Support alternative dtypes for mamba (#2036)
* Allow different dtypes in mamba.

* Add a dtype flag.
2024-04-10 18:10:01 +02:00
Laurent Mazare 7f354473cf
Optimize copy-2d for metal. (#2024)
* Optimize copy-2d for metal.

* Add a hacky stopping rule for moondream.
2024-04-07 12:34:16 +02:00
Laurent Mazare 33c9b66554
Add the new gemma models. (#2023)
* Add the new gemma models.

* Revert the lightning changes.

* Support for the 1.1 models.
2024-04-06 21:25:38 +02:00
Santiago Medina ace282e5c2
Add flag to run Moondream in f16 precision (#2015)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2

* Add flag to use f16

* Avoid breaking the quantized version on cuda.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-05 07:03:33 +02:00
Laurent Mazare c87381fc96
Use F16 for moondream on cuda. (#2013) 2024-04-04 23:30:10 +02:00
Laurent Mazare f48c07e242
Include topk sampling in the quantized example. (#2005)
* Include topk sampling in the quantized example.

* Also sample with top-k on the mistral side.
2024-04-04 09:27:54 +02:00
Santiago Medina d17b2cdad9
Match Moondream's latest release (#1997)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2
2024-04-02 21:37:09 +02:00
Laurent Mazare be9c200cbb
Expose the t5 config fields + allow t5-large. (#1987) 2024-04-01 20:58:34 +02:00
Santiago Medina ea0d8d3753
Quantized moondream implementation and BOS token (#1980)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward

* Pass bos token at the beginning of tensor.

* Quantize moondream.

* Forward with image bos token.

* Clippy.

* Use q4_0 quantization.

* Add pointers for sequence and tokens; Remove seq_len conditional
2024-04-01 19:37:54 +02:00
Laurent Mazare b20acd622c
Update for pyo3 0.21. (#1985)
* Update for pyo3 0.21.

* Also adapt the RL example.

* Fix for the pyo3-onnx bindings...

* Print details on failures.

* Revert pyi.
2024-04-01 17:07:02 +02:00
Laurent Mazare c7557b65dc
Switch the default to using the faster kernels. (#1978)
* Switch the default to using the faster kernels.

* Add the force-dmmv flag.
2024-04-01 10:00:11 +02:00
Laurent Mazare cd29c7ccd4
More ggml cuda kernels (#1977)
* Add more cuda kernels for quantized matmul.

* Add the vec-dot bits.

* Expose the quantized matmul-vec kernels.

* Also include the quantize-q8-1 kernel.

* Glue code for the q8-1 quantization.

* mm-vec product via q8-1 quantization.

* Add a test.

* Add a mm test.

* Get the test to return some sensible results.

* Also test dmmv.

* Fix the launch params.

* Allow for tweaking the force_dmmv parameter while it's experimental.
2024-04-01 00:15:48 +02:00
Laurent Mazare f9954b73ba
Add options to use local files + specify a custom repo or branch. (#1973) 2024-03-31 09:32:50 +02:00
Laurent Mazare eead1dcead
Clippy fix. (#1972) 2024-03-31 08:57:40 +02:00
Santiago Medina 92f81d2fcb
Add Moondream transformer implementation and example (#1970)
* moondream implementation

* add moondream example

* change config default activation

* Add assets and integrate phi mixformer with example

* Make use of kv cache and fix seq_len bug; Clean up example code

* Add README link to example

* Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig

* Delete image

* Use apply instead of forward
2024-03-31 08:54:56 +02:00
Laurent Mazare 3144150b8d
Move the tensor-tools binary in a separate crate. (#1969) 2024-03-30 15:49:37 +01:00
Laurent Mazare 8ad12a0e81
Add some examples using the MT5 variants. (#1963) 2024-03-29 18:09:29 +01:00
Laurent Mazare eb1b27abcd
Readme fix. (#1961) 2024-03-28 23:24:46 +01:00
Laurent Mazare 708e422456
Qwen MoE model. (#1960)
* Qwen MoE model.

* Add the MoE model to the example.

* Fix the scaling.

* Readme updates.

* Readme tweaks.
2024-03-28 23:10:57 +01:00
Laurent Mazare c5092f2c29
Add a couple t5 models. (#1958) 2024-03-28 17:58:06 +01:00
Tigran Zhampeissov b0340d72ec
CLIP model implementation with example (#1950)
* CLIP model implementation with example

* CLIP Implementation fixes, batch images

* CLIP model remove images from git

* CLIP model remove unnecessary use of batch_indices
2024-03-28 13:44:12 +01:00
Laurent Mazare e2b4829531
Support more mistral models. (#1927)
* Support more mistral models.

* Use the appropriate rope parameter.
2024-03-24 08:04:04 +01:00
Laurent Mazare a00e24d752
Improve the error message on overlong prompts. (#1908) 2024-03-21 21:08:07 +01:00
Sanchit Gandhi bb3ee48039
whisper readme (#1899) 2024-03-21 12:54:09 +01:00
Sanchit Gandhi 0c11e055be
support distil-large-v3 (#1898) 2024-03-21 11:46:49 +01:00
Laurent Mazare 18036c6ccb
Update the image crate + use the re-exported version. (#1893)
* Update the image crate + use the re-exported version.

* Update to using ab_glyph.
2024-03-21 10:56:41 +01:00
Laurent Mazare 455c42aa72
Avoid copying the data on squeeze and unsqueeze. (#1884)
* Avoid copying the data on squeeze and unsqueeze.

* Fix the quantized llama example.

* Unrelated fix for the quantized stable-lm example on cuda.

* Fix for mamba on cuda (unrelated to the PR).
2024-03-20 13:04:36 +01:00
Laurent Mazare f115895b9e
Apply rustfmt. (#1873) 2024-03-18 21:43:31 +01:00
Gabriel 6a966cf9e0
Add a DQN example to the reinforcement-learning section (#1872) 2024-03-18 21:22:53 +01:00
Laurent Mazare 58605252e8
Microphone support for the encodec example. (#1866) 2024-03-18 11:19:46 +01:00
Laurent Mazare d365ef32d9
Improve the encodec example: handle resampling. (#1865)
* Improve the encodec example: handle resampling.

* Play the audio directly.
2024-03-18 10:09:40 +01:00
Laurent Mazare a15f859ab4
Fix for the encodec example. (#1861) 2024-03-17 21:15:12 +01:00
Laurent Mazare 74bf6994b1
Move the image tensor to the appropriate device. (#1856) 2024-03-16 22:25:46 +01:00
Jani Monoses e1f9c3776d
StableLM-2 models were updated to use GPT-2 tokenization. (#1847) 2024-03-14 21:01:36 +01:00
Tyler Rockwood 3318fe30fb
Update gemma README (#1843)
* Update gemma README

* Fixit
2024-03-13 21:41:36 +01:00
Laurent Mazare 56c9d3ee7b
Fix the model path for rwkv. (#1825) 2024-03-09 11:21:48 +01:00
Laurent Mazare dd00482ea3
Quantized version of the metavoice model. (#1824)
* Quantized version of the metavoice model.

* Integrate the quantized version of metavoice.
2024-03-09 11:06:04 +01:00
Laurent Mazare 3440cec3a0
Fast CPU kernel for transposed 1d convolutions. (#1822)
* Fast CPU kernel for transposed 1d convolutions.

* Bugfix.
2024-03-08 22:43:07 +01:00
Niklas Hallqvist 0a3487a776
Add a --seed argument to the stable-diffusion example. (#1812)
* Add a --seed argument to the stable-diffusion example.

* Make the case when no seed is specified, that it will not be set, but use the engine's default.  This will make the CPU engine work again when no --seed is given, and will cause a bailout when a seed is there, as the engine does not currently support it.

---------

Co-authored-by: niklas <niklas@appli.se>
2024-03-08 08:17:36 +01:00
Laurent Mazare 8a99cf7dd2
Add a flag to select the dtype used in metavoice. (#1805) 2024-03-05 12:16:00 +01:00
Jiayu Liu 924ccae30c
Add an initial Segformer implementation (#1617)
* add segformer

* Make the id2label field optional.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-03-03 16:01:46 +01:00
Laurent Mazare 60dc72b96b
More metavoice tweaks. (#1796) 2024-03-03 15:05:25 +01:00
Laurent Mazare 20abb72fec
Normalize loudness of the generated audio (#1795)
* Normalize loudness of the generated audio.

* Lints.

* One more lint.

* Avoid running the bs1770 tests.

* Another attempt at discarding doc comments.

* Also normalize the loudness in the encodec example.
2024-03-03 14:00:42 +01:00
Laurent Mazare ca5d727ba2
Use the same padding in metavoice as in the python version. (#1794) 2024-03-03 12:04:48 +01:00
Laurent Mazare 09e0148cce
Tweaks to run metavoice on metal (#1792)
* Enable tanh + tweak conv-transpose.

* Run the encodec decoding on cpu.

* Clippy fixes.
2024-03-03 07:46:44 +01:00
Laurent Mazare de11623752
Metavoice position fix (#1791)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Fix the position index in metavoice.
2024-03-02 21:00:35 +01:00
Laurent Mazare 21f1d04976
Add the instruction finetuned gemma variants. (#1790) 2024-03-02 18:56:59 +01:00
Laurent Mazare 4fff5b51f5
Metavoice - first cut (#1717)
* Add the metavoice transformer.

* Sketch the speaker-encoder module.

* Adding to the metavoice model.

* Start adding the metavoice example.

* Get some logits out.

* Load the second stage model.

* Get the second step to run.

* Tweak the example.

* Add encodec tilting.

* Glue the different bits together.

* Fix a shape issue.

* Use a constant.

* BPE tokenization.

* Add a warning.
2024-03-02 18:50:01 +01:00
Jack Shih 6980774a91
fix rwkv example eos token (#1785) 2024-03-01 10:22:28 +01:00
Laurent Mazare 64d4038e4f
Mention rwkv v6 in the readmes. (#1784) 2024-03-01 08:58:30 +01:00
Jani Monoses 979deaca07
EfficientVit (MSRA) model (#1783)
* Add EfficientVit (Microsoft Research Asia) model.

* Mention models in README
2024-03-01 08:53:52 +01:00
Jack Shih b485e4b6ee
add models of rwkv v6 and quantized rwkv v6 (#1781)
* add models of rwkv v6 and quantized rwkv v6

* fix ci clippy fail
2024-03-01 08:37:56 +01:00
Laurent Mazare 4fd00b8900
Add the StarCoder2 model. (#1779)
* Add the StarCoder2 model.

* Add the example code and get things to work.

* And also tweak the readme.
2024-02-28 21:02:41 +01:00
Laurent Mazare 57267cd536
Add a flag to force running the quantized model on CPUs. (#1778)
* Add a flag to force running the quantized model on CPUs.

* Add encodec to the readme.
2024-02-28 14:58:42 +01:00
Laurent Mazare 60ee5cfd4d
Support more modes in the encodec example. (#1777)
* Support more modes in the encodec example.

* Remove the old encodec model from the musicgen bits.
2024-02-28 09:22:33 +01:00
Laurent Mazare 56e44aabe3
Make some dependencies optional in the examples. (#1776) 2024-02-28 07:17:03 +01:00
Laurent Mazare d0aca6c3c6
Encodec encoding demo. (#1775) 2024-02-28 06:49:03 +01:00
Laurent Mazare 0c49e95dfb
Encodec model. (#1771)
* Encodec model.

* Fixes.

* Add the padding functions.

* Get the LSTM bit to work.

* Get the encodec model to generate some tokens (decoder only for now).

* Minor tweak.

* Minor tweak.
2024-02-27 22:59:40 +01:00
Laurent Mazare 32544a2ad6
Add an option to split the prompt. (#1766) 2024-02-27 11:24:11 +01:00
Jack Shih 918136ba46
add quantized rwkv v5 model (#1743)
* and quantized rwkv v5 model

* Integrate the quantized rwkv model in the initial example.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-02-25 21:43:40 +01:00
Laurent Mazare 2f22afd80e
Cuda acceleration for quantized model. (#1754)
* Boilerplate for the quantized cuda support.

* More basic cuda support.

* More cuda quantization (quantize on cpu for now).

* Add the dequantization bit.

* Start adding some dedicated cuda kernels from llama.cpp.

* Move the kernel code.

* Start interfacing with the kernel.

* Tweak the kernel launch params.

* Bugfix for quantized metal.

* Fix some clippy lints.

* Tweak the launch parameters.

* Tweak cuda basics to perform a quantized matmul.

* Perform the dequantization on the cpu + use cublas for matmul.

* Add the dequantization kernel.

* Test the qmatmul.

* More kernels.

* Matmul-vec kernel.

* Add a couple kernels.

* More dequantization kernels.
2024-02-25 18:11:47 +01:00
Laurent Mazare 8d04f70f4d
Fix the eos token for gemma. (#1753) 2024-02-24 11:07:02 +01:00
Daniel Varga 32eb56d6b3
Fix typo in README (#1740) 2024-02-22 12:35:26 +01:00
Laurent Mazare 28057781aa
Make the cache for the llama model explicit too. (#1745) 2024-02-22 12:04:33 +01:00
laurent 544018b6d0 Explicit caching in llama2.c. 2024-02-22 10:22:03 +01:00
Laurent Mazare 45d5322d62
Add the Gemma models. (#1741)
* Add the Gemma models.

* Add the gemma example.

* Adapt the RmsNorm.

* Get the 2b model to work.

* 7b support.

* Use the config head dim.

* Yet another fix.

* Make the matrixes contiguous.

* Also get the 7b model to work.

* And add to the readme.
2024-02-21 22:02:50 +01:00
Laurent Mazare 7c7400fb63
Use the tokenizer-output-stream in the llama example. (#1715)
* Use the tokenizer-output-stream in the llama example.

* Also use tokenizer-output-stream for llama2-c.
2024-02-15 16:47:33 +01:00
Laurent Mazare 058a910d0e
Add a readme for rwkv. (#1712) 2024-02-14 15:31:33 +01:00
Laurent Mazare 26fe162ab5
Custom tokenizer for rwkv. (#1711)
* Custom tokenizer for rwkv.

* Custom tokenizer.

* Getting the tokenizer to work.
2024-02-14 15:11:38 +01:00
Laurent Mazare 2d5f2a728d
Add the RWKV model (v5). (#1707)
* Start adding the RWKV model.

* More of the forward step.

* Handle rescaling.

* FeedForward.

* More work on RWKV.

* Better state tracking.

* Finish a first pass on forward.

* Fix the shape mismatches.

* Do not rescale in f32.

* Rename to rwkv-v5.

* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
Jani Monoses 68f7655895
Add ConvNeXt-V2 and smaller model variants. (#1709) 2024-02-14 10:53:07 +01:00
Laurent Mazare ad73e93da2
Detach the tensors on batch-norm eval. (#1702)
* Detach the tensors on batch-norm eval.

* Fix pyo3 bindings.

* Black tweak.

* Formatting.

* Also update the pyo3-onnx formatting.

* Apply black.
2024-02-13 14:26:32 +01:00
drbh 13c67226e6
feat: support microphone whisper streaming (#1678)
* feat: support microphone whisper streaming

* fix: cleanup print stmts and adjust how input is read

* fix: remove incorrect comment

* feat: split into new example and simplify

* fix: feature flag example file

* fix: fmt fixes

* feat: simplify and remove redundant files
2024-02-12 18:01:21 +01:00
Laurent Mazare 1e26d539d9
Improved mamba model optimized for inference (#1694)
* Sketch the mamba model for inference.

* Complete the forward pass.

* Add the mamba example.

* Optimize the selective-scan part.

* Fix a couple shape mismatches and get inference to work.

* Tweak the readmes.

* More readme tweaks.
2024-02-11 17:04:57 +01:00
Nicolas Patry 74497e6bf7
Fixing the qwen tokenizer location. (#1693)
Using the chatglm one causes a bug where the "<|endoftext|>" is not
found.
2024-02-11 08:52:36 +01:00
Todsaporn Banjerdkit 8ab384e63d
docs: add trocr examples (#1692) 2024-02-10 16:14:50 +01:00
Laurent Mazare 27ffd644a9
Mention TrOCR in the readmes. (#1691) 2024-02-10 15:49:38 +01:00
Laurent Mazare 42ce593ec6
Use the repo config for trocr rather than hardcoding it + small tweaks. (#1689)
* Use the repo config for trocr rather than hardcoding it + small tweaks.

* Add support for the printed models.

* Fail with an appropriate error message on missing position embeddings.
2024-02-10 13:15:03 +01:00
Laurent Mazare 1c8d61f051
ChatGLM custom tokenizer. (#1687) 2024-02-10 10:47:04 +01:00
Laurent Mazare 90447bc993
Add the custom tokenizer. (#1686) 2024-02-09 17:36:50 +01:00
Laurent Mazare 40ce16001b
Use the proper endoftext token for gwen. (#1685) 2024-02-09 17:02:03 +01:00
Laurent Mazare 5657e596cd
Add the Qwen2 model (#1684)
* Initial check-in for the qwen2 model.

* More qwen2 inference.

* Polish the qwen example.

* Fix the rope basis.

* Get the inference to work.

* Support different model sizes.
2024-02-09 15:02:49 +01:00
Laurent Mazare 0dee8ea19b
Add the ChatGLM model. (#1237)
* Add the ChatGLM model.

* Rotary embeddings.

* Add to the forward pass.

* Add to the forward pass.

* Add the rotary embeddings.

* Add the KV cache.

* Add the chatglm example.

* Bugfix.

* More glm fixes.

* Fix some shape issues.

* Get the inference to work.
2024-02-09 11:51:38 +01:00
Laurent Mazare 020a979de2
Fix clippy lints for 1.76. (#1682) 2024-02-08 16:48:47 +01:00
Guoqing Bao 678f64dd27
Fix token generation in bilingual models (non-English outputs) (#1668)
Co-authored-by: Guoqing Bao <guoqing.bao@enflame-tech.com>
2024-02-06 12:03:53 +01:00
Tarek 153c940a9c
Update docs to reflect current usage of example (#1610)
modified:   candle-examples/examples/onnx/README.md
2024-02-04 11:59:47 +01:00
Laurent Mazare 50be8a98ba
Quantized support for stable-lm2. (#1654)
* Quantized support for stable-lm2.

* Quantized support for v2-zephyr.
2024-02-04 11:57:05 +01:00
Jani Monoses d32abbce53
Add StableLM-2, StableLM Code and Zephyr variants (#1650)
* Add StableLM Code and Zephyr variants

* Add V2 models

* Update README
2024-02-03 14:58:41 +01:00
Hubert Shelley dfab45e1c8
Supports more audio formats (#1628)
* Supports more audio formats

* Simplify the handling of the different buffer types.

* Check the sample rate.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-02-03 14:26:04 +01:00
Jani Monoses a52d407ae6
Add ConvNeXt model. (#1604) 2024-02-03 13:34:28 +01:00
Nicolas Patry 403680f17d
Quantized GGUF style (#1523)
* Metal quantized modifications proposal.

- Add a device param, wherever needed.
- Create new QMetal storage thing that implements QuantizedType.
- Update everywhere needed.

Fix Python.

Fixing examples.

Fix: fmt + clippy + stub.

Moving everything around.

Only missing the actual implems.

Fixing everything + adding dequantized kernels.

More work.

Fixing matmul.

Fmt + Clippy

Some clippy fixes.

Working state.

Q2K Metal -> Bugged (also present in GGML).
Q4K CPU -> Bugged (present previously, new test catch it).
Q5K CPU -> Bugged (present previously).
Q8_1 Both -> Never really implemented it seems
Q8K metal -> Never implemented in metal

Fixing Q2K bug (present in ggml).

* Cleanup.

* Fix the rebase.

* Removing the fences speeds everything up and *is* correct this time...

* Cleanup the fence.

* After rebase.

* Bad code removal.

* Rebase after phi2 merge + fix replit default to CPU.

* Making the CI happy.

* More happy tests.

---------

Co-authored-by: Nicolas Patry <nicolas@Nicolass-MacBook-Pro.local>
2024-01-17 10:27:58 +01:00
Jani Monoses 5270224f40
Add MobileOne model. (#1595)
* Add MobileOne model.

* Clippy fixes

* Remove a comment.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-01-16 06:34:16 +01:00
Laurent Mazare ea36f3b11f
Use the new phi model by default. (#1589) 2024-01-15 12:30:27 +01:00
Laurent Mazare 539ead927a
Update the Phi model to use the updated architecture. (#1580)
* Update the Phi model to use the updated architecture.

* Add more of the phi model.

* Repeat KV + caching.

* Apply the rotary embeddings.

* Add support for the new phi model in the phi example.

* Fix a couple glitches.

* Fix a couple more glitches.
2024-01-13 17:38:27 +01:00
ivarflakstad e90bcdcc7c
Metal: f16 and bf16 where_cond + benchmark (#1545)
* Use cfg to seperate benchmark results based on features

* Add metal where_cond for f16 and bf16. Add benchmark

* Remove allow pragma

* Avoid some unnecessary returns.

* Improve benchmarks layout

* Updated feature separated benchmarks

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-12 11:18:11 +01:00
Laurent Mazare 8e06bfb4fd
Mention VGG in the readme. (#1573) 2024-01-12 09:59:29 +01:00
Laurent Mazare 6242276c09
Pin the revision used for phi-v2 + make it the default. (#1572)
* Pin the revision used for phi-v2 + make it the default.

* Tweak the custom-ops build.
2024-01-12 09:19:30 +01:00
Jani Monoses 2480c5dbdd
Add RepVGG model. (#1561)
* Add RepVGG model.

* Add RepVGG README

* Extract var to top level

* Replace hashmap with a match

* Add a variant for the model kind + avoid some unnecessary config cloning.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-01-11 07:07:40 +01:00
Laurent Mazare 89b5a06858
Use bindgen-cuda for the custom-kernel example. (#1536)
* Use bindgen-cuda for the custom-kernel example.

* Only depend on the kernels when cuda is enabled.

* Skip rustfmt.
2024-01-07 17:18:46 +01:00
Nicolas Patry b4cb982e49
Simplifying our internal cargo dependencies. (#1529) 2024-01-07 12:04:14 +01:00
optman 84250bf52f
fix index_pos bug when kv cache is disabled. (#1517)
* fix index_pos bug when kv cache is disabled

* Tweak the fix.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-01-06 11:43:01 +01:00
stano 03ce8caf40
Format properly the Stable Diffusion example run with params (#1511)
Move out the --sd-version flag out of the prompt.
2024-01-01 11:13:35 +01:00
Laurent Mazare b0fe5e4453
Do not implement Module for BatchNorm. (#1513) 2024-01-01 10:13:13 +01:00
Laurent Mazare 1fb2dd905c
Add support for tiny-llama-1.1b. (#1512) 2023-12-31 12:18:25 +01:00
s-casci 51e577a682
Add Policy Gradient to Reinforcement Learning examples (#1500)
* added policy_gradient, modified main, ddpg and README

* fixed typo in README

* removed unnecessary imports

* small refactor

* Use clap for picking up the subcommand to run.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-12-30 09:01:29 +01:00
Laurent Mazare 1e442d4bb9
Fix lints for clippy 1.75. (#1494) 2023-12-28 20:26:20 +01:00
Laurent Mazare d35f0a1376
Bump the crate version to 0.3.3. (#1490) 2023-12-28 13:38:30 +01:00
Laurent Mazare 996a7f2e24
Rework the llama example config, add the solar model. (#1485) 2023-12-26 22:24:04 +01:00
Laurent Mazare 3071ea6c3e
Use the new hub helper function. (#1484) 2023-12-26 09:44:30 +01:00
Laurent Mazare 37c539f2b7
Helper function to load sharded safetensors files (#1481)
* Fix the quantized mistral example.

* Add a helper function to load sharded safetensors weights.

* Use the sharded loader.
2023-12-25 21:49:21 +01:00
Laurent Mazare 7135791dd5
Fix the quantized mistral example. (#1478) 2023-12-25 09:31:24 +01:00
Laurent Mazare 88589d8815
Support mistral instruct v0.2. (#1475)
* Support mistral instruct v0.2.

* Use the safetensors model now that they are available.
2023-12-23 16:18:49 +01:00
Laurent Mazare 5b35fd0fcf
MMLU evaluation for Phi. (#1474)
* MMLU evaluation for Phi.

* Improve the evaluation.
2023-12-23 15:28:36 +01:00
Laurent Mazare 78d982e1bd
Fix for mamba 2.8b. (#1472) 2023-12-23 11:01:39 +01:00
Laurent Mazare d8b9a727fc
Support different mamba models. (#1471) 2023-12-23 10:46:02 +01:00
Laurent Mazare ceb78d3e28
Sketch the minimal mamba example. (#1465)
* Sketch the minimal mamba example.

* Fix rustfmt.

* Forward pass for mamba.

* Finish the forward pass.

* Inference fixes.

* Bugfixes.

* More fixes.

* Add a readme.
2023-12-22 00:28:50 +01:00
Nicolas Patry 9fc210fae8
Merge pull request #1318 from huggingface/metal4
Starting to fix some tests.
2023-12-20 15:37:31 +01:00
Laurent Mazare 94817dac56
Bump the crate version to 0.3.2. (#1452) 2023-12-17 05:34:53 -06:00
Laurent Mazare 1e86717bf2
Fix a couple typos (#1451)
* Mixtral quantized instruct.

* Fix a couple typos.
2023-12-17 05:20:05 -06:00
Laurent Mazare c4cfcf1539
Tweak the readme for phi and the default sample length. (#1450) 2023-12-16 18:11:36 -06:00
Laurent Mazare 1782e93de6
Mixtral quantized instruct. (#1447) 2023-12-16 16:16:39 -06:00
Laurent Mazare e12cbfd73b
Update the readme to mention mixtral. (#1443) 2023-12-15 19:29:03 -06:00
Laurent Mazare 30a958e5dd
Quantized mixtral model (#1442)
* Add the Mixtral model.

* Add more of the mixtral layers.

* Add the final layers for mixtral.

* Sketch the expert selection.

* Add some expert routing logic.

* Hopefully finish the routing logic for mixtral.

* Add the mixtral example.

* Fix the weight filenames.

* Bugfix.

* Another fix.

* Yet another fix + remove the unused pragma.

* Shape fix.

* Support for quantized mixtral.

* Support mixtral in the quantized example.

* Mlp or moe type.

* Fix the expert field namings.

* Refactor the mlp bit.

* More MoE logic.

* Add the MoE quantized logic.

* Fix the experts length.
2023-12-15 19:16:06 -06:00
Laurent Mazare 614842b311
Add the Mixtral model. (#1437)
* Add the Mixtral model.

* Add more of the mixtral layers.

* Add the final layers for mixtral.

* Sketch the expert selection.

* Add some expert routing logic.

* Hopefully finish the routing logic for mixtral.

* Add the mixtral example.

* Fix the weight filenames.

* Bugfix.

* Another fix.

* Yet another fix + remove the unused pragma.

* Shape fix.

* Add a readme.
2023-12-15 14:19:56 -06:00
niu tech 79eab519fd
Fix phi example (#1436)
* Fix phi example

* Remove the cuda mention.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-12-15 07:01:10 -06:00
Laurent Mazare 5e33c85c8f
Quantized version for phi-v2. (#1430)
* Quantized version for phi-v2.

* More quantized support.
2023-12-13 21:16:34 -06:00
Laurent Mazare 2b3a018be7
Support for phi-2. (#1429)
* Support for phi-2.

* Use the v2 naming scheme.
2023-12-13 20:59:29 -06:00
Juarez Bochi 9bd94c1ffa
Speed up bert with approx gelu (#1410) 2023-12-06 17:46:37 +01:00
emka 8418154ee0
Add nvcc ccbin support to examples (#1401) 2023-12-03 16:01:16 +00:00
emka 99b7273b03
Add compute cap env support to examples (#1400) 2023-12-03 16:00:24 +00:00
Laurent Mazare 16161145ae
Add the leo models to the quantized examples. (#1398) 2023-12-03 12:30:41 +00:00
Laurent Mazare 0738df5290
Add more mentions to SDXL Turbo in the readme. (#1397) 2023-12-03 10:41:21 +00:00
Edwin Cheng 37bf1ed012
Stable Diffusion Turbo Support (#1395)
* Add support for SD Turbo

* Set Leading as default in euler_ancestral discrete

* Use the appropriate default values for n_steps and guidance_scale.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2023-12-03 08:37:10 +01:00
Lucas de Ávila Martins 5aa1a65dab
Add quantized Starling, fix open-chat prompt (#1393)
* Add quantized Starling, fix open-chat prompt

* Fix open-chat and starling prompts
2023-12-02 16:47:19 +00:00
Nicolas Patry 4349ff1fc2 Starting to fix some tests.
Few fixes.

Going back on remote metal-rs.

Reusing a single buffer (for now) to speed things up.

Adding some half kernels.

All tests are panicking instead of random failure.

Putting back f16 index select.

Add erf.

Working version for llama2-c.

Fixes + cache compute_pipeline_state.

BF16 metal fix.

Remove some prints.

new_owned -> new()..to_owned().

Better batched matmul.

Metal operational.

Reuse buffers on our own reference counts.

Tmp gemm.

Revert "Tmp gemm."

This reverts commit c65f68e988.

Interleave committing.

Speeding up copies using blit.

Fmt.

Fmt.

Remove the assert!

Fmt all.

Fixes after big rebase.

Add softmax for half and bfloat + tests

Fixing Llama example + accumulate softmax in float.
2023-11-30 11:30:31 +01:00
Laurent Mazare 7c3cfd1086
Use the llama weight names for the Yi example. (#1381) 2023-11-27 20:42:52 +00:00
Odunayo 762e996ce6
Distibert (#1366)
* add bce with logit loss

* add bce with logit loss

* remove imports

* fix tiny bug

* add test documentation and refactor function

* fix test cases and formatting

* distilbet files

* Apply various cleanups.

* More cleanups.

* More polish.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2023-11-24 15:09:14 +00:00
MilkFather ca19a9af62
Fix linspace implementation (#1358)
* Fix linspace implementation

`steps` should be strictly greater than 1 to make it consistent with the context.

* Handle steps == 0 and steps == 1.

* Fix rustfmt.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2023-11-23 07:35:13 +00:00
Marcus Asteborg ec23427d60
Ensure to copy data to cpu before iterating. (#1360) 2023-11-23 07:24:25 +00:00
Lucas de Ávila Martins f49bf6a81d
Fix OpenChat 3.5 tokenizer (#1347) 2023-11-19 18:48:04 +00:00