Commit Graph

50 Commits

Author SHA1 Message Date
Laurent Mazare 8097559c1a
Move the candle version to 0.7.1. (#2495) 2024-09-22 20:44:39 +02:00
Laurent Mazare c2fca0ca11
Bump the crate version. (#2491) 2024-09-21 15:13:12 +02:00
Laurent Mazare 6070278a31
Bump the version to 0.6.1. (#2438) 2024-08-22 09:23:52 +02:00
Laurent Mazare 30cdd769f9
Update the flash attn kernels. (#2333) 2024-07-15 20:37:36 +02:00
Laurent Mazare f65e90e7ef
Bump the crate version. (#2248) 2024-06-05 15:49:15 +02:00
Laurent Mazare 7ebc3548e1
Use flash-attn in gemma. (#2195)
* Use flash-attn in gemma.

* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
Laurent Mazare 89f53b9d7b
Bump the version number to 0.5.1. (#2155)
* Bump the version number to 0.5.1.

* Fix clippy lints for 1.78.

* More clippy fixes.
2024-05-03 11:17:05 +02:00
Laurent Mazare f76bb7794a
Bumping the version number to 0.5.0. (#2009) 2024-04-04 17:48:45 +02:00
Laurent Mazare e7fc1daa21
Bump the crate versions to 0.4.2. (#1821) 2024-03-08 22:01:51 +01:00
Laurent Mazare 5e526abc8c
Bump the version number to 0.4.1. (#1768)
* Fix the block size for some cuda kernels.

* Bump the version number to 0.4.1.
2024-02-27 14:19:59 +01:00
Laurent Mazare a83ca2ece0
Bump the crate version to 0.4.0. (#1658) 2024-02-04 19:08:01 +01:00
Laurent Mazare 9e824ec810
Explicit version for packages that are not in the workspace. (#1642) 2024-01-31 18:57:38 +01:00
Nicolas Patry 30313c3081
Moving to a proper build crate `bindgen_cuda`. (#1531)
* Moving to a proper build crate `bindgen_cuda`.

* Fmt.
2024-01-07 12:29:24 +01:00
Laurent Mazare e72d52b1a2
Unpin more of the workplace relative dependencies. (#1535) 2024-01-07 12:26:20 +01:00
OlivierDehaene 8d1a57c9a0
chore: update flash attention kernels (#1518)
* chore: update flash attention kernels

* fmt

* remove unused kernels

* force f32

* correct stride
2024-01-05 18:28:55 +01:00
Laurent Mazare d35f0a1376
Bump the crate version to 0.3.3. (#1490) 2023-12-28 13:38:30 +01:00
Laurent Mazare 94817dac56
Bump the crate version to 0.3.2. (#1452) 2023-12-17 05:34:53 -06:00
Laurent Mazare a209ce8ceb
Update for 0.3.1. (#1324) 2023-11-11 18:48:52 +00:00
Laurent Mazare d2c3f14773
Fix for flash-attn. (#1310)
Co-authored-by: laurent <laurent@par2dc5-ai-prd-cl01dgx02.cm.cluster>
2023-11-10 10:27:27 +01:00
OlivierDehaene 75629981bc
feat: parse Cuda compute cap from env (#1066)
* feat: add support for multiple compute caps

* Revert to one compute cap

* fmt

* fix
2023-10-16 15:37:38 +01:00
Laurent Mazare 096dee7073
Bump the version to 0.3.0. (#1014)
* Bump the version to 0.3.0.

* Changelog update.
2023-10-01 13:51:57 +01:00
Laurent Mazare 7dd8e12472
Bump the crate versions to v0.2.3. (#886)
* Bump the crate version.

* Also update the python bindings.
2023-09-18 12:14:03 +01:00
Laurent Mazare 2257f4d475
Bump the crate version + update the changelog. (#822) 2023-09-12 06:39:24 +01:00
Laurent Mazare 0e250aee4f
Shape with holes (#770)
* Shape with holes.

* rustfmt.
2023-09-08 08:38:13 +01:00
Zsombor cfcbec9fc7
Add small customization to the build (#768)
* Add ability to override the compiler used by NVCC from an environment variable

* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR

* Add the compilation failure to the readme, with a possible solution

* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
Laurent Mazare ab0d9fbdd1
Properly set the is_bf16 flag. (#738) 2023-09-04 16:45:26 +01:00
Laurent Mazare f80fd44201
BF16 support for flash-attn. (#737) 2023-09-04 16:35:43 +01:00
Laurent Mazare d0cdea95a5
Add back the bf16 flash-attn kernels. (#730) 2023-09-04 07:50:52 +01:00
Laurent Mazare 618f4e4c78
Add some documentation. (#673)
* Add some documentation.

* Bump the crate version.
2023-08-30 11:54:00 +01:00
Laurent Mazare a3f97c143d
Bump the crate version + update CHANGELOG. (#628) 2023-08-27 18:17:11 +01:00
Laurent Mazare aba1e90797
Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare a8f61e66cc
Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
Laurent Mazare 03be33eea4
Relax the requirements on CustomOp. (#486)
* Relax the requirements on CustomOp.

* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
Chengxu Yang ebcfd96d94
add c++17 flags (#452) 2023-08-15 15:29:34 +01:00
Laurent Mazare 531f23b4d0
Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare e72ba0b9e7
Add the license files. (#335) 2023-08-07 14:11:27 +01:00
Laurent Mazare 4fe8a02f88
Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
Laurent Mazare 03a421f714
Add some missing readme files. (#304) 2023-08-02 10:57:12 +01:00
Laurent Mazare d38943aadc
Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare 51e51da896
Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
Laurent Mazare 67834119fc
Fix the flash-attention function names. (#282) 2023-07-31 10:04:39 +01:00
Laurent Mazare 0ace420e66
Flash attention without padding (varlen). (#281)
* Expose the seqlen variable for flash-attn without padding.

* Fix the batched call.

* Adapt for the varlen variant.

* No need to set the batch strides when in varlen mode.

* Add a test (disabled at the moment).

* Get the test to work properly.
2023-07-31 09:45:39 +01:00
Laurent Mazare 3eb2bc6d07
Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
Laurent Mazare 4f92420132
Add some flash attn test (#253)
* Add some flash-attn test.

* Add the cpu test.

* Fail when the head is not a multiple of 8.

* Polish the flash attention test.
2023-07-26 20:56:00 +01:00
Laurent Mazare 1235aa2536
Use bail rather than wrapping a string where possible. (#249)
* Use bail rather than wrapping a string where possible.

* Revert the cuda default bit.
2023-07-26 15:42:46 +01:00
Laurent Mazare f052ba76cb
Lining up the flash attn version with the non-flash one. (#248)
* Move the flash-attn function in the proper crate.

* Causality tweak.
2023-07-26 15:11:45 +01:00
Laurent Mazare 2ce5f12513
Again set a few extra params in flash-attn. (#245)
* Again set a few extra params.

* Use the appropriate kernel sizes.

* Add all the kernel sizes.

* Parallel compiling.

* Reduce the amount of parallelism.

* Add the missing kernel.

* Fix a typo.

* Remove bf16 support for now.
2023-07-26 14:16:37 +01:00
Laurent Mazare fa2b64d678
Proper flash-attn parameters. (#244)
* Proper flash-attn parameters.

* Set the flash attention parameters.

* Add more validations.

* Setup the o_ flash attn parameters.

* More flash-attn support.

* Set more flash attn parameters.
2023-07-26 10:13:40 +01:00
Laurent Mazare 471855e2ee
Specific cache dir for the flash attn build artifacts. (#242) 2023-07-26 08:04:02 +01:00
Laurent Mazare d9f9c859af
Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.

* More flash attn.

* Set up the flash attn parameters.

* Get things to compile locally.

* Move the flash attention files in a different directory.

* Build the static C library with nvcc.

* Add more flash attention.

* Update the build part.

* Better caching.

* Exclude flash attention from the default workspace.

* Put flash-attn behind a feature gate.

* Get the flash attn kernel to run.

* Move the flags to a more appropriate place.

* Enable flash attention in llama.

* Use flash attention in llama.
2023-07-26 07:48:10 +01:00