Commit Graph

2161 Commits

Author SHA1 Message Date
Eric Buehler 9182c828e6
Automatically upcast for to_u64 (#2244) 2024-06-04 11:32:36 +02:00
Taylor Ninesling 3f13ad3d79
Fix dataset id for MNIST (#2238) 2024-06-04 06:27:24 +02:00
chenwanqq cd4d941ed1
Add LLaVA support (#2234)
* first commit

* llava

* clippy and fmt

* some fixes

* minor fixes

* remove useless file

* refactor: Remove llava/constants.rs and update llava/mod.rs

* modify variable name

* modify code after clippy

* Minor tweaks.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-06-03 11:54:09 +02:00
mokulus 03344d3c19
ONNX: Add Floor and Ceil (#2235) 2024-06-02 21:45:20 +02:00
Lionel Touati 1ec3b2cc18
add where_cond f32 for metal (#2236) 2024-06-02 14:30:06 +02:00
Laurent Mazare f7773d498a
Deactivate some book test that breaks the CI. (#2233)
* Deactivate some book test that breaks the CI.

* Clippy fix.
2024-06-01 09:44:22 +02:00
Eric Buehler 7abc3b8cd7
Bump cudarc version to 0.11.4 (#2230) 2024-06-01 08:18:35 +02:00
Laurent Mazare 46012ed31f
Another cudarc update. (#2229) 2024-05-30 22:27:06 +02:00
Laurent Mazare f3fade3b03
Update cudarc to 0.11.2. (#2227) 2024-05-29 18:50:52 +02:00
Dave Lage ea260aeffd
Add Debug, Clone, Deserialize to moondream config (#2222) 2024-05-28 06:08:00 +02:00
Laurent Mazare 0814dfd148
Add a metal kernel for col2im1d. (#2214)
* Add a metal kernel for col2im1d.

* Enable the col2im variant.

* Bugfix.

* Revert the quantized tweak.
2024-05-25 11:03:23 +02:00
Laurent Mazare 3ceca9901a
Enable the new layer-norm. (#2213)
* Enable the new layer-norm.

* Shape fixes.
2024-05-24 16:48:21 +02:00
Laurent Mazare 1df2bddccf
Add the layernorm specialized op. (#2212)
* Add the layernorm cuda kernels.

* Dedicated layer norm op.

* Add the slower variant.

* Plug the cuda implementation.

* Add the metal variant.

* Add a dedicated test.

* Bugfix.
2024-05-24 15:58:01 +02:00
Laurent Mazare 6f0b807ffd
More efficient cuda implementation for ConvTranspose1d. (#2211)
* More efficient cuda implementation for ConvTranspose1d.

* Small tweak.
2024-05-24 11:05:43 +02:00
Laurent Mazare d54e02d73d
Avoid a contiguous call in the quantized phi 3 model. (#2209)
* Simplify the KvCache api.

* Avoid a contiguous call in the quantized phi3 model.
2024-05-23 21:24:55 +02:00
Laurent Mazare 45e235a747
Simplify the KvCache api. (#2207) 2024-05-23 17:07:21 +02:00
Laurent Mazare 31cf64147b
Add a couple kv-cache helper functions. (#2206) 2024-05-23 16:21:47 +02:00
Jani Monoses 77ea479a18
Add Phi-3 Medium (#2205) 2024-05-23 13:33:17 +02:00
Laurent Mazare 72e7ca529a
Add some missing where-cond kernels for metal. (#2203) 2024-05-22 09:44:52 +02:00
mokulus 7ff921c538
Add RandomNormal ONNX operator (#2200) 2024-05-21 21:47:32 +02:00
Laurent Mazare 9b8537a62f
Remove the deprecated wav crate in favor of hound. (#2202) 2024-05-21 21:43:35 +02:00
Laurent Mazare 7ebc3548e1
Use flash-attn in gemma. (#2195)
* Use flash-attn in gemma.

* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
Laurent Mazare eefc1c77ef
Support flash-attn in quantized phi3. (#2194) 2024-05-18 17:12:56 +02:00
Laurent Mazare 01545f7303
Add a slice_set op. (#2193)
* Add a slice_set op.

* Add some testing.

* Add the dedicated kv-cache module.

* Derive debug and clone.

* Expose more kv-cache functions.

* Return the current data when appending.

* Use the new cache in the quantized phi3 model.
2024-05-18 15:58:18 +02:00
Yin Guobing 349c3e806a
Support embedding model gte-Qwen1.5-7B-instruct (#2190)
* Support embedding model gte-Qwen1.5-7B-instruct

This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.

An example is provided, and had been verified according to the official
PyTorch implementation.

* Avoid doing the 'last-token filtering' based on the absence of attention mask.

---------

Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-05-16 21:34:10 +02:00
Martin Stefcek bdaa34216a
chore: add fix for windows cudarc into the readme (#2189) 2024-05-16 14:32:50 +02:00
Daniel Varga cc80e065e5
Allow the threshold argumet to be negative in the segment-anything example (#2187)
Threshold is 0.0 by default, negative values make more points included,
expanding the mask. Positive values make it more picky, making the mask
smaller.

Negative numbers start with a minus sign, which normally makes clap
consider it a flag.
2024-05-15 13:17:20 +02:00
Harry Stern 13c64f6828
Fix VarBuilder::from_slice_safetensors (#2180)
Also implement SimpleBackend for SliceSafetensors

Signed-off-by: Harry Stern <harry@harrystern.net>
2024-05-12 07:26:06 +02:00
Laurent Mazare 21f82a5155
Add SliceSafetensors. (#2179)
* Add SlicedSafetensors.

* And add some testing.
2024-05-11 13:15:42 +02:00
Laurent Mazare 9cff7bc3f4
Make it possible to use TF32 accumulation in F32 matmuls. (#2178)
* Allow the use of tf32 accumulation in matmul.

* Better timings.

* Dummy versions for use when cuda is not enabled.
2024-05-11 12:28:39 +02:00
Laurent Mazare d9bc5ec151
Switch cudarc back to dynamic linking. (#2176) 2024-05-09 10:35:44 +02:00
Sidharth Rajaram 84328e2b60
Update cudarc requirement from 0.11.0 to 0.11.1 (#2174)
* Upgrading cudarc dependency from v0.11.0 to v0.11.1 due to that version having resolved a compile-time bug.

See: https://github.com/huggingface/candle/issues/2173
2024-05-08 20:40:36 +02:00
dependabot[bot] 82b641fd27
Update cudarc requirement from 0.10.0 to 0.11.0 (#2165)
* Update cudarc requirement from 0.10.0 to 0.11.0

Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc) to permit the latest version.
- [Release notes](https://github.com/coreylowman/cudarc/releases)
- [Commits](https://github.com/coreylowman/cudarc/compare/v0.10.0...v0.10.0)

---
updated-dependencies:
- dependency-name: cudarc
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* Use the default cuda version.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-05-06 17:12:14 +02:00
Laurent Mazare 01794dc16e
Use write rather than try-write on the metal rw-locks. (#2162) 2024-05-05 07:22:46 +02:00
Laurent Mazare a75cd8164f
Force the revision for the phi3-llama quantized models. (#2159) 2024-05-04 10:41:18 +02:00
Laurent Mazare b13a82a438
Separate quantized phi-3 implementation. (#2157)
* Separate quantized phi-3 implementation.

* Integrate the quantized phi3 model.=

* Small fixes, get the generation to work properly.

* Keep the old llama implementation around.

* Change the default.
2024-05-04 10:14:57 +02:00
Laurent Mazare 59b18d974e
Pin the version used for the quantized phi 3 gguf file. (#2156) 2024-05-03 15:03:22 +02:00
Laurent Mazare 89f53b9d7b
Bump the version number to 0.5.1. (#2155)
* Bump the version number to 0.5.1.

* Fix clippy lints for 1.78.

* More clippy fixes.
2024-05-03 11:17:05 +02:00
Laurent Mazare a09d451d11
Support top-k in tthe llama example. (#2150) 2024-05-01 22:25:47 +02:00
Laurent Mazare fa06f5f5f9
F16/BF16 bugfix (bis). (#2143)
* F16/BF16 bugfix (bis).

* Another fix.

* Yet another fix.
2024-04-29 14:08:44 +02:00
Laurent Mazare 09d4845aa8
Bugfix the recent f16/bf16 changes. (#2142) 2024-04-29 13:30:11 +02:00
Jeffrey Dallatezza a0d03aded1
Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. (#2124)
* When converting a tensor to a variable, clone if the tensor is already a variable.

* Add a test to ensure training a batch norm works with VarMaps

---------

Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@Jeffreys-Laptop.local>
2024-04-29 11:21:53 +02:00
MilkFather 3bbb88fcb4
Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114)
* add sigmoid op

* small fix

* add as a method on `Tensor`

* implement gradient calculation for sigmoid

* add sigmoid tests

* we should have a specialized op for this

* fix clippy

* fix clippy 2

* Revert all previous commits in favor of a `CustomOp` based solution

* use `CustomOp1` implementation

* fix rustfmt

* experimental add metal impl

* add cuda kernel impl

* fix fmt

* Add a test + reduce some cuda duplication.

---------

Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-29 11:04:43 +02:00
Laurent Mazare ed7b99f525
Add a toggle for F16/BF16 accumulation in gemm. (#2141)
* Add a toggle to control f16/bf16 gemm precision.

* Use the faster variant in the quantized example.

* Bugfix.
2024-04-29 09:21:07 +02:00
Laurent Mazare 287013ef28
Add a forward_via_f16 method to the qmatmul op. (#2138) 2024-04-28 20:35:01 +02:00
Laurent Mazare eb26e2467e
Add the cuda dequantize f16 kernels. (#2137)
* Add the cuda dequantize f16 kernels.

* Expose the cuda kernels.

* Add some testing + fix.

* Test the other cases too.

* A few more tests.

* Add an environment variable to enable the dequantize f16 + matmul behavior.
2024-04-28 20:05:05 +02:00
hardlydearly c68ed8963f
chore: fix some typos in comments (#2121)
Signed-off-by: hardlydearly <799511800@qq.com>
2024-04-28 08:34:32 +02:00
Laurent Mazare e5c8b88f90
Apply the cast before the scaling. (#2135) 2024-04-28 08:30:35 +02:00
Laurent Mazare 805f3be8e1
Add a sort function. (#2134) 2024-04-28 08:18:04 +02:00
Laurent Mazare 3b429f3023
Make the dtype configurable for phi. (#2133) 2024-04-27 21:32:49 +02:00