Commit Graph

1793 Commits

Author SHA1 Message Date
Sacha Arbonel 11ea7aac4d
tests (#1724) 2024-02-23 06:35:46 +01:00
Daniel Varga 32eb56d6b3
Fix typo in README (#1740) 2024-02-22 12:35:26 +01:00
Laurent Mazare 28057781aa
Make the cache for the llama model explicit too. (#1745) 2024-02-22 12:04:33 +01:00
laurent 544018b6d0 Explicit caching in llama2.c. 2024-02-22 10:22:03 +01:00
Laurent Mazare c753f72c85
Support for attention bias in gemma + refactor things a bit. (#1744)
* Support for attention bias in gemma + refactor things a bit.

* Fix the cuda tests.
2024-02-22 09:35:28 +01:00
Kirpal Grewal 8013b50829
Add grads for interpolate1d (#1742)
* add backprop for interpolate1d

* fix clippy lint

* correct fix clippy lint
2024-02-22 08:44:01 +01:00
Laurent Mazare 45d5322d62
Add the Gemma models. (#1741)
* Add the Gemma models.

* Add the gemma example.

* Adapt the RmsNorm.

* Get the 2b model to work.

* 7b support.

* Use the config head dim.

* Yet another fix.

* Make the matrixes contiguous.

* Also get the 7b model to work.

* And add to the readme.
2024-02-21 22:02:50 +01:00
Laurent Mazare a2cb2edead
Add a couple backtraces on cpu errors. (#1738) 2024-02-20 19:54:13 +01:00
Laurent Mazare fc67d878bb
Bugfix for conv-transpose1d (#1734)
* Add a currently broken test.

* Bugfix + fix test.
2024-02-19 09:04:49 +01:00
Laurent Mazare 3ba37443e5
Bugfix for applying the bias in conv1d-transpose. (#1732) 2024-02-18 22:51:20 +01:00
Laurent Mazare 1fb728772d
Support for groups in conv-transpose1d. (#1731)
* Groups support in conv-transpose-1d.

* Remove dangling file.
2024-02-18 21:28:07 +01:00
Laurent Mazare cb86b0c82c
Fix float unpickling. (#1730) 2024-02-18 19:33:55 +01:00
Laurent Mazare 6284ad784c
Module implementation for options. (#1728) 2024-02-18 14:12:55 +01:00
Laurent Mazare 678d44a7f6
Expose the weights and biases in transposed convolutions. (#1727) 2024-02-18 10:35:01 +01:00
Laurent Mazare 41416d2376
Expose more conv1d functions/structs. (#1726) 2024-02-17 18:50:55 +01:00
Laurent Mazare 5ebcfeaf0f
Make the r, k, v tensors contiguous. (#1719) 2024-02-16 09:17:35 +01:00
Laurent Mazare 7c7400fb63
Use the tokenizer-output-stream in the llama example. (#1715)
* Use the tokenizer-output-stream in the llama example.

* Also use tokenizer-output-stream for llama2-c.
2024-02-15 16:47:33 +01:00
Laurent Mazare 058a910d0e
Add a readme for rwkv. (#1712) 2024-02-14 15:31:33 +01:00
Laurent Mazare 26fe162ab5
Custom tokenizer for rwkv. (#1711)
* Custom tokenizer for rwkv.

* Custom tokenizer.

* Getting the tokenizer to work.
2024-02-14 15:11:38 +01:00
Laurent Mazare 121a71e01f
Fix the silu cuda kernel. (#1710) 2024-02-14 11:08:18 +01:00
Laurent Mazare 2d5f2a728d
Add the RWKV model (v5). (#1707)
* Start adding the RWKV model.

* More of the forward step.

* Handle rescaling.

* FeedForward.

* More work on RWKV.

* Better state tracking.

* Finish a first pass on forward.

* Fix the shape mismatches.

* Do not rescale in f32.

* Rename to rwkv-v5.

* Add the new models to the readme.
2024-02-14 10:58:32 +01:00
Jani Monoses 68f7655895
Add ConvNeXt-V2 and smaller model variants. (#1709) 2024-02-14 10:53:07 +01:00
OlivierDehaene b60064780d
feat: add silu activation function (#1706)
* feat: add silu activation function

* use silu/arg in grad

* update candle-nn

* use node
2024-02-14 10:27:22 +01:00
Nicolas Patry 14010a8498
Update our cuda runner. (#1705)
* Update our cuda runner.

* Fix install rust.

* Simplify.

* Docker in docker.

* Install curl

* Install curl

* No sudo.

* devel

* Put curl again.

* Add missing deps.

* pkg-config.

* Cleanup.
2024-02-13 19:06:15 +01:00
Laurent Mazare 0de0795220
Qmetal tweaks (#1704)
* Add the dummy qmetal backend.

* Fix the metal compilation.
2024-02-13 18:11:17 +01:00
Nicolas Patry c1b418586c
Fixing quantized llama demo on metal. (#1703) 2024-02-13 16:28:56 +01:00
Laurent Mazare ad73e93da2
Detach the tensors on batch-norm eval. (#1702)
* Detach the tensors on batch-norm eval.

* Fix pyo3 bindings.

* Black tweak.

* Formatting.

* Also update the pyo3-onnx formatting.

* Apply black.
2024-02-13 14:26:32 +01:00
drbh 13c67226e6
feat: support microphone whisper streaming (#1678)
* feat: support microphone whisper streaming

* fix: cleanup print stmts and adjust how input is read

* fix: remove incorrect comment

* feat: split into new example and simplify

* fix: feature flag example file

* fix: fmt fixes

* feat: simplify and remove redundant files
2024-02-12 18:01:21 +01:00
Laurent Mazare d0aa197b07
ConvTranspose1d cuda support. (#1697)
* ConvTranspose1d cuda support.

* Add the conv-transpose1d kernel.

* Remove some unused variables.
2024-02-12 15:03:18 +01:00
Laurent Mazare 274bf11633
Support defaultdict in PyTorch checkpoints. (#1696)
* Support defaultdict in PyTorch checkpoints.

* Fix clippy lint.
2024-02-12 10:26:56 +01:00
Laurent Mazare 1e26d539d9
Improved mamba model optimized for inference (#1694)
* Sketch the mamba model for inference.

* Complete the forward pass.

* Add the mamba example.

* Optimize the selective-scan part.

* Fix a couple shape mismatches and get inference to work.

* Tweak the readmes.

* More readme tweaks.
2024-02-11 17:04:57 +01:00
Nicolas Patry 74497e6bf7
Fixing the qwen tokenizer location. (#1693)
Using the chatglm one causes a bug where the "<|endoftext|>" is not
found.
2024-02-11 08:52:36 +01:00
Todsaporn Banjerdkit 8ab384e63d
docs: add trocr examples (#1692) 2024-02-10 16:14:50 +01:00
Laurent Mazare 27ffd644a9
Mention TrOCR in the readmes. (#1691) 2024-02-10 15:49:38 +01:00
Laurent Mazare bf20cc854c
Support sinusoidal embeddings in trocr. (#1690)
* Support sinusoidal embeddings in trocr.

* Support tie-word-embeddings.
2024-02-10 15:17:51 +01:00
Laurent Mazare 42ce593ec6
Use the repo config for trocr rather than hardcoding it + small tweaks. (#1689)
* Use the repo config for trocr rather than hardcoding it + small tweaks.

* Add support for the printed models.

* Fail with an appropriate error message on missing position embeddings.
2024-02-10 13:15:03 +01:00
Laurent Mazare 67589791d2
Remove the unused pragma in vit + handle the final layernorm. (#1688) 2024-02-10 11:08:50 +01:00
Laurent Mazare 1c8d61f051
ChatGLM custom tokenizer. (#1687) 2024-02-10 10:47:04 +01:00
Laurent Mazare 90447bc993
Add the custom tokenizer. (#1686) 2024-02-09 17:36:50 +01:00
Laurent Mazare 40ce16001b
Use the proper endoftext token for gwen. (#1685) 2024-02-09 17:02:03 +01:00
Laurent Mazare 5657e596cd
Add the Qwen2 model (#1684)
* Initial check-in for the qwen2 model.

* More qwen2 inference.

* Polish the qwen example.

* Fix the rope basis.

* Get the inference to work.

* Support different model sizes.
2024-02-09 15:02:49 +01:00
Laurent Mazare 0dee8ea19b
Add the ChatGLM model. (#1237)
* Add the ChatGLM model.

* Rotary embeddings.

* Add to the forward pass.

* Add to the forward pass.

* Add the rotary embeddings.

* Add the KV cache.

* Add the chatglm example.

* Bugfix.

* More glm fixes.

* Fix some shape issues.

* Get the inference to work.
2024-02-09 11:51:38 +01:00
drbh 9cadd4e644
feat: support multithread spectrogram and small perf tweaks (#1674)
* feat: support multithread spectrogram and small perf tweaks

* feat: clippy improvement for loop variable

* fix: add back speed up scale down logic

* fix: readd mirroring logic

* feat: prefer scoped thread and simplify/improve logic/traits
2024-02-08 21:54:12 +01:00
Laurent Mazare 020a979de2
Fix clippy lints for 1.76. (#1682) 2024-02-08 16:48:47 +01:00
Laurent Mazare cdc3823d8f
Pickle support: dig within the _rebuild_parameter calls. (#1681) 2024-02-08 13:09:49 +01:00
Dilshod Tadjibaev e5eb9602d0
Add support for loading Fortran contiguous tensors (#1672)
* Add support for loading Fortran contiguous tensors

This commit introduces the ability to handle Fortran contiguous tensors in the tensor loading process. Previously, the code only supported loading tensors that were contiguous in memory, failing with an error for non-contiguous tensors. With this update, tensors identified as Fortran contiguous (column-major order) are now correctly handled by reversing their dimensions after loading. This enhancement ensures broader compatibility with different tensor layouts, improving the robustness of tensor loading operations.

- Check if a tensor is Fortran contiguous using the `is_fortran_contiguous` flag.
- For Fortran contiguous tensors, reverse the dimensions after loading to correctly represent their layout in memory.
- Continue to bail out with an error for tensors that are neither C contiguous nor Fortran contiguous, maintaining the previous behavior for non-contiguous tensors without explicit support.

This change addresses the issue of loading Fortran contiguous tensors, which was previously unsupported, thereby extending the functionality of the tensor loading mechanism to accommodate a wider variety of tensor layouts.

* Add reshape step to handle fortran contiguous case

* Skip fortran contiguous fix if rank is < 2

* Fail on rank 0, 1 if contiguous
2024-02-07 21:49:59 +01:00
Dilshod Tadjibaev b75e8945bc
Enhance pickle to retrieve state_dict with a given key (#1671) 2024-02-06 21:17:33 +01:00
Daniël de Kok a90fc5ca5a
Add `VarBuilder::from_backend` (#1670)
`candle-nn` already exposes a trait to define custom backends. However,
it's not possible to actually construct a `VarBuilder` with a custom
backend because the constructor is not exposed.

This change makes the constructor public and renames it from `new` to
`from_backend` to avoid that it is seen as the primary
constructor (which could be confusing to users).
2024-02-06 15:26:11 +01:00
Laurent Mazare adfae2460a
Fix rustfmt. (#1669) 2024-02-06 12:06:06 +01:00
Guoqing Bao 678f64dd27
Fix token generation in bilingual models (non-English outputs) (#1668)
Co-authored-by: Guoqing Bao <guoqing.bao@enflame-tech.com>
2024-02-06 12:03:53 +01:00