Eric Buehler
9182c828e6
Automatically upcast for to_u64 ( #2244 )
2024-06-04 11:32:36 +02:00
Taylor Ninesling
3f13ad3d79
Fix dataset id for MNIST ( #2238 )
2024-06-04 06:27:24 +02:00
chenwanqq
cd4d941ed1
Add LLaVA support ( #2234 )
...
* first commit
* llava
* clippy and fmt
* some fixes
* minor fixes
* remove useless file
* refactor: Remove llava/constants.rs and update llava/mod.rs
* modify variable name
* modify code after clippy
* Minor tweaks.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-06-03 11:54:09 +02:00
mokulus
03344d3c19
ONNX: Add Floor and Ceil ( #2235 )
2024-06-02 21:45:20 +02:00
Lionel Touati
1ec3b2cc18
add where_cond f32 for metal ( #2236 )
2024-06-02 14:30:06 +02:00
Laurent Mazare
f7773d498a
Deactivate some book test that breaks the CI. ( #2233 )
...
* Deactivate some book test that breaks the CI.
* Clippy fix.
2024-06-01 09:44:22 +02:00
Eric Buehler
7abc3b8cd7
Bump cudarc version to 0.11.4 ( #2230 )
2024-06-01 08:18:35 +02:00
Laurent Mazare
46012ed31f
Another cudarc update. ( #2229 )
2024-05-30 22:27:06 +02:00
Laurent Mazare
f3fade3b03
Update cudarc to 0.11.2. ( #2227 )
2024-05-29 18:50:52 +02:00
Dave Lage
ea260aeffd
Add Debug, Clone, Deserialize to moondream config ( #2222 )
2024-05-28 06:08:00 +02:00
Laurent Mazare
0814dfd148
Add a metal kernel for col2im1d. ( #2214 )
...
* Add a metal kernel for col2im1d.
* Enable the col2im variant.
* Bugfix.
* Revert the quantized tweak.
2024-05-25 11:03:23 +02:00
Laurent Mazare
3ceca9901a
Enable the new layer-norm. ( #2213 )
...
* Enable the new layer-norm.
* Shape fixes.
2024-05-24 16:48:21 +02:00
Laurent Mazare
1df2bddccf
Add the layernorm specialized op. ( #2212 )
...
* Add the layernorm cuda kernels.
* Dedicated layer norm op.
* Add the slower variant.
* Plug the cuda implementation.
* Add the metal variant.
* Add a dedicated test.
* Bugfix.
2024-05-24 15:58:01 +02:00
Laurent Mazare
6f0b807ffd
More efficient cuda implementation for ConvTranspose1d. ( #2211 )
...
* More efficient cuda implementation for ConvTranspose1d.
* Small tweak.
2024-05-24 11:05:43 +02:00
Laurent Mazare
d54e02d73d
Avoid a contiguous call in the quantized phi 3 model. ( #2209 )
...
* Simplify the KvCache api.
* Avoid a contiguous call in the quantized phi3 model.
2024-05-23 21:24:55 +02:00
Laurent Mazare
45e235a747
Simplify the KvCache api. ( #2207 )
2024-05-23 17:07:21 +02:00
Laurent Mazare
31cf64147b
Add a couple kv-cache helper functions. ( #2206 )
2024-05-23 16:21:47 +02:00
Jani Monoses
77ea479a18
Add Phi-3 Medium ( #2205 )
2024-05-23 13:33:17 +02:00
Laurent Mazare
72e7ca529a
Add some missing where-cond kernels for metal. ( #2203 )
2024-05-22 09:44:52 +02:00
mokulus
7ff921c538
Add RandomNormal ONNX operator ( #2200 )
2024-05-21 21:47:32 +02:00
Laurent Mazare
9b8537a62f
Remove the deprecated wav crate in favor of hound. ( #2202 )
2024-05-21 21:43:35 +02:00
Laurent Mazare
7ebc3548e1
Use flash-attn in gemma. ( #2195 )
...
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
Laurent Mazare
eefc1c77ef
Support flash-attn in quantized phi3. ( #2194 )
2024-05-18 17:12:56 +02:00
Laurent Mazare
01545f7303
Add a slice_set op. ( #2193 )
...
* Add a slice_set op.
* Add some testing.
* Add the dedicated kv-cache module.
* Derive debug and clone.
* Expose more kv-cache functions.
* Return the current data when appending.
* Use the new cache in the quantized phi3 model.
2024-05-18 15:58:18 +02:00
Yin Guobing
349c3e806a
Support embedding model gte-Qwen1.5-7B-instruct ( #2190 )
...
* Support embedding model gte-Qwen1.5-7B-instruct
This is a text embedding model based on Qwen2. They share same
model architecture except the last MLP module. This commit brings in
minimal modification of the old Qwen2 implementation to support both
models.
An example is provided, and had been verified according to the official
PyTorch implementation.
* Avoid doing the 'last-token filtering' based on the absence of attention mask.
---------
Co-authored-by: Laurent <laurent.mazare@gmail.com>
2024-05-16 21:34:10 +02:00
Martin Stefcek
bdaa34216a
chore: add fix for windows cudarc into the readme ( #2189 )
2024-05-16 14:32:50 +02:00
Daniel Varga
cc80e065e5
Allow the threshold argumet to be negative in the segment-anything example ( #2187 )
...
Threshold is 0.0 by default, negative values make more points included,
expanding the mask. Positive values make it more picky, making the mask
smaller.
Negative numbers start with a minus sign, which normally makes clap
consider it a flag.
2024-05-15 13:17:20 +02:00
Harry Stern
13c64f6828
Fix VarBuilder::from_slice_safetensors ( #2180 )
...
Also implement SimpleBackend for SliceSafetensors
Signed-off-by: Harry Stern <harry@harrystern.net>
2024-05-12 07:26:06 +02:00
Laurent Mazare
21f82a5155
Add SliceSafetensors. ( #2179 )
...
* Add SlicedSafetensors.
* And add some testing.
2024-05-11 13:15:42 +02:00
Laurent Mazare
9cff7bc3f4
Make it possible to use TF32 accumulation in F32 matmuls. ( #2178 )
...
* Allow the use of tf32 accumulation in matmul.
* Better timings.
* Dummy versions for use when cuda is not enabled.
2024-05-11 12:28:39 +02:00
Laurent Mazare
d9bc5ec151
Switch cudarc back to dynamic linking. ( #2176 )
2024-05-09 10:35:44 +02:00
Sidharth Rajaram
84328e2b60
Update cudarc requirement from 0.11.0 to 0.11.1 ( #2174 )
...
* Upgrading cudarc dependency from v0.11.0 to v0.11.1 due to that version having resolved a compile-time bug.
See: https://github.com/huggingface/candle/issues/2173
2024-05-08 20:40:36 +02:00
dependabot[bot]
82b641fd27
Update cudarc requirement from 0.10.0 to 0.11.0 ( #2165 )
...
* Update cudarc requirement from 0.10.0 to 0.11.0
Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc ) to permit the latest version.
- [Release notes](https://github.com/coreylowman/cudarc/releases )
- [Commits](https://github.com/coreylowman/cudarc/compare/v0.10.0...v0.10.0 )
---
updated-dependencies:
- dependency-name: cudarc
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
* Use the default cuda version.
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-05-06 17:12:14 +02:00
Laurent Mazare
01794dc16e
Use write rather than try-write on the metal rw-locks. ( #2162 )
2024-05-05 07:22:46 +02:00
Laurent Mazare
a75cd8164f
Force the revision for the phi3-llama quantized models. ( #2159 )
2024-05-04 10:41:18 +02:00
Laurent Mazare
b13a82a438
Separate quantized phi-3 implementation. ( #2157 )
...
* Separate quantized phi-3 implementation.
* Integrate the quantized phi3 model.=
* Small fixes, get the generation to work properly.
* Keep the old llama implementation around.
* Change the default.
2024-05-04 10:14:57 +02:00
Laurent Mazare
59b18d974e
Pin the version used for the quantized phi 3 gguf file. ( #2156 )
2024-05-03 15:03:22 +02:00
Laurent Mazare
89f53b9d7b
Bump the version number to 0.5.1. ( #2155 )
...
* Bump the version number to 0.5.1.
* Fix clippy lints for 1.78.
* More clippy fixes.
2024-05-03 11:17:05 +02:00
Laurent Mazare
a09d451d11
Support top-k in tthe llama example. ( #2150 )
2024-05-01 22:25:47 +02:00
Laurent Mazare
fa06f5f5f9
F16/BF16 bugfix (bis). ( #2143 )
...
* F16/BF16 bugfix (bis).
* Another fix.
* Yet another fix.
2024-04-29 14:08:44 +02:00
Laurent Mazare
09d4845aa8
Bugfix the recent f16/bf16 changes. ( #2142 )
2024-04-29 13:30:11 +02:00
Jeffrey Dallatezza
a0d03aded1
Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. ( #2124 )
...
* When converting a tensor to a variable, clone if the tensor is already a variable.
* Add a test to ensure training a batch norm works with VarMaps
---------
Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@Jeffreys-Laptop.local>
2024-04-29 11:21:53 +02:00
MilkFather
3bbb88fcb4
Fix sigmoid gradient calculation and move sigmoid into a specialized op ( #2114 )
...
* add sigmoid op
* small fix
* add as a method on `Tensor`
* implement gradient calculation for sigmoid
* add sigmoid tests
* we should have a specialized op for this
* fix clippy
* fix clippy 2
* Revert all previous commits in favor of a `CustomOp` based solution
* use `CustomOp1` implementation
* fix rustfmt
* experimental add metal impl
* add cuda kernel impl
* fix fmt
* Add a test + reduce some cuda duplication.
---------
Co-authored-by: laurent <laurent.mazare@gmail.com>
2024-04-29 11:04:43 +02:00
Laurent Mazare
ed7b99f525
Add a toggle for F16/BF16 accumulation in gemm. ( #2141 )
...
* Add a toggle to control f16/bf16 gemm precision.
* Use the faster variant in the quantized example.
* Bugfix.
2024-04-29 09:21:07 +02:00
Laurent Mazare
287013ef28
Add a forward_via_f16 method to the qmatmul op. ( #2138 )
2024-04-28 20:35:01 +02:00
Laurent Mazare
eb26e2467e
Add the cuda dequantize f16 kernels. ( #2137 )
...
* Add the cuda dequantize f16 kernels.
* Expose the cuda kernels.
* Add some testing + fix.
* Test the other cases too.
* A few more tests.
* Add an environment variable to enable the dequantize f16 + matmul behavior.
2024-04-28 20:05:05 +02:00
hardlydearly
c68ed8963f
chore: fix some typos in comments ( #2121 )
...
Signed-off-by: hardlydearly <799511800@qq.com>
2024-04-28 08:34:32 +02:00
Laurent Mazare
e5c8b88f90
Apply the cast before the scaling. ( #2135 )
2024-04-28 08:30:35 +02:00
Laurent Mazare
805f3be8e1
Add a sort function. ( #2134 )
2024-04-28 08:18:04 +02:00
Laurent Mazare
3b429f3023
Make the dtype configurable for phi. ( #2133 )
2024-04-27 21:32:49 +02:00