candle

Commit Graph

Author	SHA1	Message	Date
Eric Buehler	9182c828e6	Automatically upcast for to_u64 (#2244 )	2024-06-04 11:32:36 +02:00
Taylor Ninesling	3f13ad3d79	Fix dataset id for MNIST (#2238 )	2024-06-04 06:27:24 +02:00
chenwanqq	cd4d941ed1	Add LLaVA support (#2234 ) * first commit * llava * clippy and fmt * some fixes * minor fixes * remove useless file * refactor: Remove llava/constants.rs and update llava/mod.rs * modify variable name * modify code after clippy * Minor tweaks. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-06-03 11:54:09 +02:00
mokulus	03344d3c19	ONNX: Add Floor and Ceil (#2235 )	2024-06-02 21:45:20 +02:00
Lionel Touati	1ec3b2cc18	add where_cond f32 for metal (#2236 )	2024-06-02 14:30:06 +02:00
Laurent Mazare	f7773d498a	Deactivate some book test that breaks the CI. (#2233 ) * Deactivate some book test that breaks the CI. * Clippy fix.	2024-06-01 09:44:22 +02:00
Eric Buehler	7abc3b8cd7	Bump cudarc version to 0.11.4 (#2230 )	2024-06-01 08:18:35 +02:00
Laurent Mazare	46012ed31f	Another cudarc update. (#2229 )	2024-05-30 22:27:06 +02:00
Laurent Mazare	f3fade3b03	Update cudarc to 0.11.2. (#2227 )	2024-05-29 18:50:52 +02:00
Dave Lage	ea260aeffd	Add Debug, Clone, Deserialize to moondream config (#2222 )	2024-05-28 06:08:00 +02:00
Laurent Mazare	0814dfd148	Add a metal kernel for col2im1d. (#2214 ) * Add a metal kernel for col2im1d. * Enable the col2im variant. * Bugfix. * Revert the quantized tweak.	2024-05-25 11:03:23 +02:00
Laurent Mazare	3ceca9901a	Enable the new layer-norm. (#2213 ) * Enable the new layer-norm. * Shape fixes.	2024-05-24 16:48:21 +02:00
Laurent Mazare	1df2bddccf	Add the layernorm specialized op. (#2212 ) * Add the layernorm cuda kernels. * Dedicated layer norm op. * Add the slower variant. * Plug the cuda implementation. * Add the metal variant. * Add a dedicated test. * Bugfix.	2024-05-24 15:58:01 +02:00
Laurent Mazare	6f0b807ffd	More efficient cuda implementation for ConvTranspose1d. (#2211 ) * More efficient cuda implementation for ConvTranspose1d. * Small tweak.	2024-05-24 11:05:43 +02:00
Laurent Mazare	d54e02d73d	Avoid a contiguous call in the quantized phi 3 model. (#2209 ) * Simplify the KvCache api. * Avoid a contiguous call in the quantized phi3 model.	2024-05-23 21:24:55 +02:00
Laurent Mazare	45e235a747	Simplify the KvCache api. (#2207 )	2024-05-23 17:07:21 +02:00
Laurent Mazare	31cf64147b	Add a couple kv-cache helper functions. (#2206 )	2024-05-23 16:21:47 +02:00
Jani Monoses	77ea479a18	Add Phi-3 Medium (#2205 )	2024-05-23 13:33:17 +02:00
Laurent Mazare	72e7ca529a	Add some missing where-cond kernels for metal. (#2203 )	2024-05-22 09:44:52 +02:00
mokulus	7ff921c538	Add RandomNormal ONNX operator (#2200 )	2024-05-21 21:47:32 +02:00
Laurent Mazare	9b8537a62f	Remove the deprecated wav crate in favor of hound. (#2202 )	2024-05-21 21:43:35 +02:00
Laurent Mazare	7ebc3548e1	Use flash-attn in gemma. (#2195 ) * Use flash-attn in gemma. * Fix flash-attn for head dim 256.	2024-05-18 19:18:59 +02:00
Laurent Mazare	eefc1c77ef	Support flash-attn in quantized phi3. (#2194 )	2024-05-18 17:12:56 +02:00
Laurent Mazare	01545f7303	Add a slice_set op. (#2193 ) * Add a slice_set op. * Add some testing. * Add the dedicated kv-cache module. * Derive debug and clone. * Expose more kv-cache functions. * Return the current data when appending. * Use the new cache in the quantized phi3 model.	2024-05-18 15:58:18 +02:00
Yin Guobing	349c3e806a	Support embedding model gte-Qwen1.5-7B-instruct (#2190 ) * Support embedding model gte-Qwen1.5-7B-instruct This is a text embedding model based on Qwen2. They share same model architecture except the last MLP module. This commit brings in minimal modification of the old Qwen2 implementation to support both models. An example is provided, and had been verified according to the official PyTorch implementation. * Avoid doing the 'last-token filtering' based on the absence of attention mask. --------- Co-authored-by: Laurent <laurent.mazare@gmail.com>	2024-05-16 21:34:10 +02:00
Martin Stefcek	bdaa34216a	chore: add fix for windows cudarc into the readme (#2189 )	2024-05-16 14:32:50 +02:00
Daniel Varga	cc80e065e5	Allow the threshold argumet to be negative in the segment-anything example (#2187 ) Threshold is 0.0 by default, negative values make more points included, expanding the mask. Positive values make it more picky, making the mask smaller. Negative numbers start with a minus sign, which normally makes clap consider it a flag.	2024-05-15 13:17:20 +02:00
Harry Stern	13c64f6828	Fix VarBuilder::from_slice_safetensors (#2180 ) Also implement SimpleBackend for SliceSafetensors Signed-off-by: Harry Stern <harry@harrystern.net>	2024-05-12 07:26:06 +02:00
Laurent Mazare	21f82a5155	Add SliceSafetensors. (#2179 ) * Add SlicedSafetensors. * And add some testing.	2024-05-11 13:15:42 +02:00
Laurent Mazare	9cff7bc3f4	Make it possible to use TF32 accumulation in F32 matmuls. (#2178 ) * Allow the use of tf32 accumulation in matmul. * Better timings. * Dummy versions for use when cuda is not enabled.	2024-05-11 12:28:39 +02:00
Laurent Mazare	d9bc5ec151	Switch cudarc back to dynamic linking. (#2176 )	2024-05-09 10:35:44 +02:00
Sidharth Rajaram	84328e2b60	Update cudarc requirement from 0.11.0 to 0.11.1 (#2174 ) * Upgrading cudarc dependency from v0.11.0 to v0.11.1 due to that version having resolved a compile-time bug. See: https://github.com/huggingface/candle/issues/2173	2024-05-08 20:40:36 +02:00
dependabot[bot]	82b641fd27	Update cudarc requirement from 0.10.0 to 0.11.0 (#2165 ) * Update cudarc requirement from 0.10.0 to 0.11.0 Updates the requirements on [cudarc](https://github.com/coreylowman/cudarc) to permit the latest version. - [Release notes](https://github.com/coreylowman/cudarc/releases) - [Commits](https://github.com/coreylowman/cudarc/compare/v0.10.0...v0.10.0) --- updated-dependencies: - dependency-name: cudarc dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Use the default cuda version. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-05-06 17:12:14 +02:00
Laurent Mazare	01794dc16e	Use write rather than try-write on the metal rw-locks. (#2162 )	2024-05-05 07:22:46 +02:00
Laurent Mazare	a75cd8164f	Force the revision for the phi3-llama quantized models. (#2159 )	2024-05-04 10:41:18 +02:00
Laurent Mazare	b13a82a438	Separate quantized phi-3 implementation. (#2157 ) * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.	2024-05-04 10:14:57 +02:00
Laurent Mazare	59b18d974e	Pin the version used for the quantized phi 3 gguf file. (#2156 )	2024-05-03 15:03:22 +02:00
Laurent Mazare	89f53b9d7b	Bump the version number to 0.5.1. (#2155 ) * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.	2024-05-03 11:17:05 +02:00
Laurent Mazare	a09d451d11	Support top-k in tthe llama example. (#2150 )	2024-05-01 22:25:47 +02:00
Laurent Mazare	fa06f5f5f9	F16/BF16 bugfix (bis). (#2143 ) * F16/BF16 bugfix (bis). * Another fix. * Yet another fix.	2024-04-29 14:08:44 +02:00
Laurent Mazare	09d4845aa8	Bugfix the recent f16/bf16 changes. (#2142 )	2024-04-29 13:30:11 +02:00
Jeffrey Dallatezza	a0d03aded1	Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. (#2124 ) * When converting a tensor to a variable, clone if the tensor is already a variable. * Add a test to ensure training a batch norm works with VarMaps --------- Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@Jeffreys-Laptop.local>	2024-04-29 11:21:53 +02:00
MilkFather	3bbb88fcb4	Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114 ) * add sigmoid op * small fix * add as a method on `Tensor` * implement gradient calculation for sigmoid * add sigmoid tests * we should have a specialized op for this * fix clippy * fix clippy 2 * Revert all previous commits in favor of a `CustomOp` based solution * use `CustomOp1` implementation * fix rustfmt * experimental add metal impl * add cuda kernel impl * fix fmt * Add a test + reduce some cuda duplication. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-29 11:04:43 +02:00
Laurent Mazare	ed7b99f525	Add a toggle for F16/BF16 accumulation in gemm. (#2141 ) * Add a toggle to control f16/bf16 gemm precision. * Use the faster variant in the quantized example. * Bugfix.	2024-04-29 09:21:07 +02:00
Laurent Mazare	287013ef28	Add a forward_via_f16 method to the qmatmul op. (#2138 )	2024-04-28 20:35:01 +02:00
Laurent Mazare	eb26e2467e	Add the cuda dequantize f16 kernels. (#2137 ) * Add the cuda dequantize f16 kernels. * Expose the cuda kernels. * Add some testing + fix. * Test the other cases too. * A few more tests. * Add an environment variable to enable the dequantize f16 + matmul behavior.	2024-04-28 20:05:05 +02:00
hardlydearly	c68ed8963f	chore: fix some typos in comments (#2121 ) Signed-off-by: hardlydearly <799511800@qq.com>	2024-04-28 08:34:32 +02:00
Laurent Mazare	e5c8b88f90	Apply the cast before the scaling. (#2135 )	2024-04-28 08:30:35 +02:00
Laurent Mazare	805f3be8e1	Add a sort function. (#2134 )	2024-04-28 08:18:04 +02:00
Laurent Mazare	3b429f3023	Make the dtype configurable for phi. (#2133 )	2024-04-27 21:32:49 +02:00

1 2 3 4 5 ...

2161 Commits All Branches Search

2161 Commits

All Branches