candle

Commit Graph

Author	SHA1	Message	Date
laurent	f7980abbcd	Improve the sampling methods.	2024-05-04 10:53:30 +02:00
Laurent Mazare	a75cd8164f	Force the revision for the phi3-llama quantized models. (#2159 )	2024-05-04 10:41:18 +02:00
Laurent Mazare	b13a82a438	Separate quantized phi-3 implementation. (#2157 ) * Separate quantized phi-3 implementation. * Integrate the quantized phi3 model.= * Small fixes, get the generation to work properly. * Keep the old llama implementation around. * Change the default.	2024-05-04 10:14:57 +02:00
Laurent Mazare	59b18d974e	Pin the version used for the quantized phi 3 gguf file. (#2156 )	2024-05-03 15:03:22 +02:00
Laurent Mazare	89f53b9d7b	Bump the version number to 0.5.1. (#2155 ) * Bump the version number to 0.5.1. * Fix clippy lints for 1.78. * More clippy fixes.	2024-05-03 11:17:05 +02:00
Laurent Mazare	a09d451d11	Support top-k in tthe llama example. (#2150 )	2024-05-01 22:25:47 +02:00
Laurent Mazare	fa06f5f5f9	F16/BF16 bugfix (bis). (#2143 ) * F16/BF16 bugfix (bis). * Another fix. * Yet another fix.	2024-04-29 14:08:44 +02:00
Laurent Mazare	09d4845aa8	Bugfix the recent f16/bf16 changes. (#2142 )	2024-04-29 13:30:11 +02:00
Jeffrey Dallatezza	a0d03aded1	Bug Fix: When converting a tensor to a variable, clone if the tensor is already a variable. (#2124 ) * When converting a tensor to a variable, clone if the tensor is already a variable. * Add a test to ensure training a batch norm works with VarMaps --------- Co-authored-by: Jeffrey Dallatezza <jeffreydallatezza@Jeffreys-Laptop.local>	2024-04-29 11:21:53 +02:00
MilkFather	3bbb88fcb4	Fix sigmoid gradient calculation and move sigmoid into a specialized op (#2114 ) * add sigmoid op * small fix * add as a method on `Tensor` * implement gradient calculation for sigmoid * add sigmoid tests * we should have a specialized op for this * fix clippy * fix clippy 2 * Revert all previous commits in favor of a `CustomOp` based solution * use `CustomOp1` implementation * fix rustfmt * experimental add metal impl * add cuda kernel impl * fix fmt * Add a test + reduce some cuda duplication. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-29 11:04:43 +02:00
Laurent Mazare	ed7b99f525	Add a toggle for F16/BF16 accumulation in gemm. (#2141 ) * Add a toggle to control f16/bf16 gemm precision. * Use the faster variant in the quantized example. * Bugfix.	2024-04-29 09:21:07 +02:00
Laurent Mazare	287013ef28	Add a forward_via_f16 method to the qmatmul op. (#2138 )	2024-04-28 20:35:01 +02:00
Laurent Mazare	eb26e2467e	Add the cuda dequantize f16 kernels. (#2137 ) * Add the cuda dequantize f16 kernels. * Expose the cuda kernels. * Add some testing + fix. * Test the other cases too. * A few more tests. * Add an environment variable to enable the dequantize f16 + matmul behavior.	2024-04-28 20:05:05 +02:00
hardlydearly	c68ed8963f	chore: fix some typos in comments (#2121 ) Signed-off-by: hardlydearly <799511800@qq.com>	2024-04-28 08:34:32 +02:00
Laurent Mazare	e5c8b88f90	Apply the cast before the scaling. (#2135 )	2024-04-28 08:30:35 +02:00
Laurent Mazare	805f3be8e1	Add a sort function. (#2134 )	2024-04-28 08:18:04 +02:00
Laurent Mazare	3b429f3023	Make the dtype configurable for phi. (#2133 )	2024-04-27 21:32:49 +02:00
Laurent Mazare	96a48e5cc4	Add argsort. (#2132 ) * Add the argsort cuda kernels. * CPU version of arg-sort. * Hook the cuda kernel + rework the cpu bits. * Add some dedicated test. * Working cuda kernel. * Metal kernel. * Metal adjustments. * Bugfix. * Use the fast rope in qwen. * Rework the expert selection in qwen.	2024-04-27 20:17:35 +02:00
Isotr0py	6cf82fd7a3	Add Olmo models (#2127 ) * add olmo support * add olmo readme * Fix fmt. * Fix clippy. * Get olmo to work on cuda. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-26 11:02:51 +02:00
Laurent Mazare	cfab6e7616	Mention phi-v3 in the readmes. (#2122 )	2024-04-24 20:54:24 +02:00
Laurent Mazare	11d4a3c588	Add the phi-3 model. (#2120 ) * Add the phi-3 model. * Faster rope. * Bugfix. * Fix the detokenization.	2024-04-24 09:48:13 +02:00
Laurent Mazare	9d3f1c8af5	Add the phi-v3 quantized model. (#2118 ) * Add the phi-v3 quantized model. * Also include phi-3 in the main phi example.	2024-04-24 08:22:23 +02:00
Laurent Mazare	7211009179	Fix for rustfmt. (#2117 )	2024-04-23 19:09:33 +02:00
B1rtek	6fadaf2eff	candle-onnx: add operators RandomUniform and Exp (#2116 ) * Add basic RandomUniform implementation * Use is_some to check if seed is present * Added Exp operator implementation --------- Co-authored-by: Mateusz Okulus <mmokulus@gmail.com>	2024-04-23 19:02:19 +02:00
Laurent Mazare	8a05743a21	Add StorageRef. (#2113 ) * Add the storage-ref bits. * Add the metal implementation.	2024-04-23 13:23:27 +02:00
Laurent Mazare	b2e816752b	Use the faster rms-norm kernel for llama. (#2107 ) * Use the faster rms-norm kernel for llama. * Use the fast variant by default.	2024-04-22 18:52:00 +02:00
Laurent Mazare	618ecf5e23	Better time measurement for the llama example. (#2106 )	2024-04-22 17:54:27 +02:00
dependabot[bot]	267601eec1	Update tokenizers requirement from 0.15.0 to 0.19.1 (#2104 ) Updates the requirements on [tokenizers](https://github.com/huggingface/tokenizers) to permit the latest version. - [Release notes](https://github.com/huggingface/tokenizers/releases) - [Changelog](https://github.com/huggingface/tokenizers/blob/main/RELEASE.md) - [Commits](https://github.com/huggingface/tokenizers/compare/v0.15.0...v0.15.2) --- updated-dependencies: - dependency-name: tokenizers dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-22 17:10:46 +02:00
dependabot[bot]	08a15cb79e	Update zip requirement from 0.6.6 to 1.1.1 (#2103 ) * Update zip requirement from 0.6.6 to 1.1.1 --- updated-dependencies: - dependency-name: zip dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * Fix for the zip crate update. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-22 16:23:27 +02:00
Laurent Mazare	c388be93e7	Updated quantized phi model (#2099 ) * Quantized phi in a separate file. * Add the quantized phi example + rework the model code. * Improve the phi model. * Get some generation out. * Use the appropriate rope shape. * Tweak the default prompt. --------- Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-21 07:37:07 +02:00
Santiago Medina	d22f1d4f4e	Derive clone and debug traits for Moondream model (#2100 ) * moondream implementation * add moondream example * change config default activation * Add assets and integrate phi mixformer with example * Make use of kv cache and fix seq_len bug; Clean up example code * Add README link to example * Remove pos_embed scaling; Remove assets; Add to README; Expand VisionConfig * Delete image * Use apply instead of forward * Use latest release special token; Fix token/s accuracy; Use GeluPytorchTanh in VisionConfig v2 * Derive debug and clone traits for Moondream model.	2024-04-21 07:08:28 +02:00
Thomas Santerre	0067fe00a8	Metal Unary: Add benchmarks and process kernels in a tile based fashion (#2056 ) * add basic unary bench for sqrt * process unary commands in tiles of 4 * re-enable all benchmarks * rename helper to unary * modify approach to split up tiled and non-tiled operations * undo bench ignore for other tests * update tile size to 2 * only perform the optimization on the contiguous even numbered element case	2024-04-21 00:10:33 +02:00
Laurent Mazare	587ee3bb6f	Small cleanups to the llama multi-process example. (#2098 )	2024-04-20 22:19:46 +02:00
Laurent Mazare	dd78422701	Handle multiple dimensions in metal QMM + two fixes. (#2097 )	2024-04-20 18:55:45 +02:00
Gabriel	9215e9ce8c	Add missing onnx operations (#2096 ) * Add missing onnx operations * Add tests and fix errors * Run rustfmt	2024-04-20 18:44:22 +02:00
Laurent Mazare	52ae332910	Use llama v3 by default + add to readme. (#2094 )	2024-04-20 16:11:24 +02:00
Laurent Mazare	8b390ddd29	Only download the weights in the main process (and not in the child processes). (#2093 )	2024-04-20 13:01:23 +02:00
Laurent Mazare	c97d639fa0	Multiprocess/multi-GPU support for llama 3. (#2092 ) * Multiprocess/multi-GPU support for llama 3. * Modernize the mp example a bit.	2024-04-20 12:49:21 +02:00
Laurent Mazare	b45c710dbf	Fix for gemma MQA. (#2091 )	2024-04-19 21:49:55 +02:00
Laurent Mazare	9c532aef47	Also enable llama-v3 8b instruct. (#2088 )	2024-04-19 08:50:06 +02:00
Thomas Santerre	f7a6468238	Add support for llama3 on the quantized example (#2086 ) * add support for l3b, new tokenizer * add todo * Add todo and use k_s model * Use the official tokenizers. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-18 22:52:00 +02:00
Laurent Mazare	2b93dffe64	Use faster rotary embeddings for llama like models. (#2087 )	2024-04-18 22:34:29 +02:00
Laurent Mazare	e6ee7ba4d4	Llama v3. (#2085 ) * Llama v3. * Tweak the default params + handle special tokens. * Small tweak.	2024-04-18 22:19:54 +02:00
Laurent Mazare	1690ab45d2	Fix the silu gradient issue on 0. (#2083 )	2024-04-18 14:31:41 +02:00
Laurent Mazare	8de0ce6cba	Add more QMMV cuda kernels. (#2077 ) * Add more QMMV cuda kernels. * Enable the new kernels. * Adapt the testing.	2024-04-18 08:36:43 +02:00
Laurent Mazare	ce6d08df94	Minor fix to the readme. (#2080 ) Co-authored-by: Jane Doe <jane.doe@example.org>	2024-04-17 22:43:00 +02:00
Laurent Mazare	2817643db9	Add the mmv kernels for small batch sizes. (#2075 ) * Add the mmv kernels for smaller sizes. * Support more mmv kernels. * Use the new kernels. * Fix the call. * Silly fix. * Improve the testing. * Fix for dmmv. * Add another dedicated test for the batching mmv.	2024-04-16 21:30:51 +02:00
NorilskMajor	4d14777673	Utilize batches in Stable Diffusion (#2071 ) * Utilize batches in Stable Diffusion that were already there, but unutilized. Also refactor out the `save_image` function. * Clippy + cosmetic fixes. --------- Co-authored-by: laurent <laurent.mazare@gmail.com>	2024-04-16 06:49:04 +02:00
Laurent Mazare	f135b7963d	Fix for the batch dim in the quantized matmul example. (#2073 ) * Fix for the batch dim in the quantized matmul example. * Enable more tests on cuda. * Add a test for qmm with a batch. * Fix the zeros-dim test on metal.	2024-04-15 20:00:28 +02:00
Laurent Mazare	af955f260c	Make the falcon model cloneable. (#2067 )	2024-04-15 09:39:03 +02:00

1 2 3 4 5 ...

2028 Commits All Branches Search

2028 Commits

All Branches