candle

Commit Graph

Author	SHA1	Message	Date
Laurent Mazare	d9f9c859af	Add flash attention (#241 ) * Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab. * More flash attn. * Set up the flash attn parameters. * Get things to compile locally. * Move the flash attention files in a different directory. * Build the static C library with nvcc. * Add more flash attention. * Update the build part. * Better caching. * Exclude flash attention from the default workspace. * Put flash-attn behind a feature gate. * Get the flash attn kernel to run. * Move the flags to a more appropriate place. * Enable flash attention in llama. * Use flash attention in llama.	2023-07-26 07:48:10 +01:00
Laurent Mazare	5a26cba733	Re-organize the wasm examples (#231 ) * Move the whisper example. * More renaming. * Add llama2 as a new wasm example. * Live generation. * More of the llama wasm example. * Formatting.	2023-07-24 12:36:02 +01:00
Laurent Mazare	dc416243a3	Bump the hf-hub dependency to 0.1.3. (#206 )	2023-07-20 07:27:52 +01:00
Laurent Mazare	c34f932319	Fix the mkl build. (#204 ) * Fix the mkl build. * Fix the build properly.	2023-07-19 19:41:11 +01:00
Nicolas Patry	439321745a	Removing `candle-hub` internal to extract into `hf-hub` standalone.	2023-07-19 15:04:38 +02:00
Laurent Mazare	b8abe2bb4b	Factorize the tokenizers version in the workspace cargo def. (#186 )	2023-07-18 06:48:13 +01:00
Laurent Mazare	f0cccd08f0	Bert tracing (#184 ) * Add some tracing to bert. * More tracing. * Add a flag for tracing.	2023-07-17 19:40:42 +01:00
Laurent Mazare	49ea09c73c	Gemm update (#183 ) * Update the gemm dependency. * Update the comment too. * Pin the sha256 dependency.	2023-07-17 14:05:39 +01:00
Laurent Mazare	104f89df31	Centralize the dependency versions and inherit them. (#177 )	2023-07-16 07:47:17 +01:00
Laurent Mazare	d1f5d44c04	Reenable pyo3 in the workspace list (#170 ) * Enable pyo3 back. * Adapt the CI.	2023-07-14 19:54:38 +01:00
Nicolas Patry	4ed56d7861	Removing cuda default. Seems very important for a lot of exploring users usually on laptop without GPUs. Adding more README instructions in a follow up.	2023-07-14 16:52:15 +02:00
Laurent Mazare	88f666781f	Wasm proof of concept. (#167 ) * Wasm proof of concept. * Run whisper inference in the browser. * Some fixes. * Move the wasm example. * Change the tokenizer config.	2023-07-14 14:51:46 +01:00
Laurent Mazare	21aa29ddce	Use a rwlock for inner mutability. (#156 ) * Use a rw-lock. * Make clippy happier.	2023-07-13 11:25:24 +01:00
Laurent Mazare	50b0946a2d	Tensor mutability (#154 ) * Working towards tensor mutability. * Use a ref-cell to provide tensor mutability.	2023-07-13 11:04:40 +01:00
Laurent Mazare	ba35d895e7	Sketch the candle-transformers crate. (#147 ) * Sketch the candle-transformers crate. * Format the empty files.	2023-07-12 13:49:31 +01:00
Laurent Mazare	9ce0f1c010	Sketch the candle-nn crate. (#115 ) * Sketch the candle-nn crate. * Tweak the cuda dependencies. * More cuda tweaks.	2023-07-10 08:50:09 +01:00
Laurent Mazare	4afa461b34	Sketch the Falcon model. (#93 ) * Sketch the Falcon model. * Add more substance to the falcon example. * Falcon (wip). * Falcon (wip again). * Falcon inference. * Get the weights from the api and properly generate the model. * Use the proper model. * Fix the file/revision names. * Fix bias handling. * Recompute the rot embeddings. * Fix the input shape. * Add the release-with-debug profile. * Silly bugfix. * More bugfixes. * Stricter shape checking in matmul.	2023-07-06 19:01:21 +01:00
laurent	fdb1acd2ff	Move llama in a cargo-examples directory.	2023-07-03 11:30:58 +01:00
laurent	ebb0fedf14	Very simple pyo3 bindings for candle.	2023-07-01 20:36:44 +01:00
laurent	af66f0829e	Revert the new profile.	2023-06-29 19:08:50 +01:00
laurent	3232df9458	Add some KV cache to llama.	2023-06-29 15:29:40 +01:00
Nicolas Patry	1a82bc50c9	[Tmp] Adding candle-hub	2023-06-27 13:58:23 +02:00
Nicolas Patry	d7f729fb8f	Refactor the hierarchy.	2023-06-27 11:57:27 +02:00
laurent	22da2c7e02	More f16 and bf16 support.	2023-06-26 20:52:01 +01:00
laurent	a31411fd91	Start adding f16/bf16 support.	2023-06-26 19:37:47 +01:00
laurent	11696e6377	Faster model weight loading.	2023-06-26 07:40:11 +01:00
laurent	96c098b6cd	Remove the unecessary features.	2023-06-24 18:15:44 +01:00
laurent	a7f80e258f	Read and write npy files.	2023-06-24 18:12:10 +01:00
Nicolas Patry	04cf14f35a	Moving to `gemm` and adding matmul backprop. - Tentative `T` operator.	2023-06-22 12:37:02 +02:00
Nicolas Patry	9ea220fc6e	Fixing tokenizers dep.	2023-06-22 12:25:58 +02:00
Nicolas Patry	ce977b489e	Adding matmul?	2023-06-22 12:25:58 +02:00
laurent	083ced4428	Integrate the kernels bits.	2023-06-22 09:59:00 +01:00
laurent	7adffafeda	Abstract the gradient storage.	2023-06-21 14:29:48 +01:00
laurent	9698211d56	Add some very basic tensor type.	2023-06-19 17:26:50 +01:00

1 2 3

134 Commits