Laurent Mazare
8097559c1a
Move the candle version to 0.7.1. ( #2495 )
2024-09-22 20:44:39 +02:00
Laurent Mazare
c2fca0ca11
Bump the crate version. ( #2491 )
2024-09-21 15:13:12 +02:00
Laurent Mazare
6070278a31
Bump the version to 0.6.1. ( #2438 )
2024-08-22 09:23:52 +02:00
Laurent Mazare
30cdd769f9
Update the flash attn kernels. ( #2333 )
2024-07-15 20:37:36 +02:00
Laurent Mazare
f65e90e7ef
Bump the crate version. ( #2248 )
2024-06-05 15:49:15 +02:00
Laurent Mazare
7ebc3548e1
Use flash-attn in gemma. ( #2195 )
...
* Use flash-attn in gemma.
* Fix flash-attn for head dim 256.
2024-05-18 19:18:59 +02:00
Laurent Mazare
89f53b9d7b
Bump the version number to 0.5.1. ( #2155 )
...
* Bump the version number to 0.5.1.
* Fix clippy lints for 1.78.
* More clippy fixes.
2024-05-03 11:17:05 +02:00
Laurent Mazare
f76bb7794a
Bumping the version number to 0.5.0. ( #2009 )
2024-04-04 17:48:45 +02:00
Laurent Mazare
e7fc1daa21
Bump the crate versions to 0.4.2. ( #1821 )
2024-03-08 22:01:51 +01:00
Laurent Mazare
5e526abc8c
Bump the version number to 0.4.1. ( #1768 )
...
* Fix the block size for some cuda kernels.
* Bump the version number to 0.4.1.
2024-02-27 14:19:59 +01:00
Laurent Mazare
a83ca2ece0
Bump the crate version to 0.4.0. ( #1658 )
2024-02-04 19:08:01 +01:00
Laurent Mazare
9e824ec810
Explicit version for packages that are not in the workspace. ( #1642 )
2024-01-31 18:57:38 +01:00
Nicolas Patry
30313c3081
Moving to a proper build crate `bindgen_cuda`. ( #1531 )
...
* Moving to a proper build crate `bindgen_cuda`.
* Fmt.
2024-01-07 12:29:24 +01:00
Laurent Mazare
e72d52b1a2
Unpin more of the workplace relative dependencies. ( #1535 )
2024-01-07 12:26:20 +01:00
OlivierDehaene
8d1a57c9a0
chore: update flash attention kernels ( #1518 )
...
* chore: update flash attention kernels
* fmt
* remove unused kernels
* force f32
* correct stride
2024-01-05 18:28:55 +01:00
Laurent Mazare
d35f0a1376
Bump the crate version to 0.3.3. ( #1490 )
2023-12-28 13:38:30 +01:00
Laurent Mazare
94817dac56
Bump the crate version to 0.3.2. ( #1452 )
2023-12-17 05:34:53 -06:00
Laurent Mazare
a209ce8ceb
Update for 0.3.1. ( #1324 )
2023-11-11 18:48:52 +00:00
Laurent Mazare
d2c3f14773
Fix for flash-attn. ( #1310 )
...
Co-authored-by: laurent <laurent@par2dc5-ai-prd-cl01dgx02.cm.cluster>
2023-11-10 10:27:27 +01:00
OlivierDehaene
75629981bc
feat: parse Cuda compute cap from env ( #1066 )
...
* feat: add support for multiple compute caps
* Revert to one compute cap
* fmt
* fix
2023-10-16 15:37:38 +01:00
Laurent Mazare
096dee7073
Bump the version to 0.3.0. ( #1014 )
...
* Bump the version to 0.3.0.
* Changelog update.
2023-10-01 13:51:57 +01:00
Laurent Mazare
7dd8e12472
Bump the crate versions to v0.2.3. ( #886 )
...
* Bump the crate version.
* Also update the python bindings.
2023-09-18 12:14:03 +01:00
Laurent Mazare
2257f4d475
Bump the crate version + update the changelog. ( #822 )
2023-09-12 06:39:24 +01:00
Laurent Mazare
0e250aee4f
Shape with holes ( #770 )
...
* Shape with holes.
* rustfmt.
2023-09-08 08:38:13 +01:00
Zsombor
cfcbec9fc7
Add small customization to the build ( #768 )
...
* Add ability to override the compiler used by NVCC from an environment variable
* Allow relative paths in CANDLE_FLASH_ATTN_BUILD_DIR
* Add the compilation failure to the readme, with a possible solution
* Adjust the error message, and remove the special handling of the relative paths
2023-09-08 08:15:14 +01:00
Laurent Mazare
ab0d9fbdd1
Properly set the is_bf16 flag. ( #738 )
2023-09-04 16:45:26 +01:00
Laurent Mazare
f80fd44201
BF16 support for flash-attn. ( #737 )
2023-09-04 16:35:43 +01:00
Laurent Mazare
d0cdea95a5
Add back the bf16 flash-attn kernels. ( #730 )
2023-09-04 07:50:52 +01:00
Laurent Mazare
618f4e4c78
Add some documentation. ( #673 )
...
* Add some documentation.
* Bump the crate version.
2023-08-30 11:54:00 +01:00
Laurent Mazare
a3f97c143d
Bump the crate version + update CHANGELOG. ( #628 )
2023-08-27 18:17:11 +01:00
Laurent Mazare
aba1e90797
Add some group parameter to convolutions. ( #566 )
...
* Add some group parameter to convolutions.
* Avoid some unnecessary groups checks.
* Move the tensor convolution bits.
* Properh handling of groups.
* Bump the crate version.
* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare
a8f61e66cc
Bump the crates version to 0.1.2. ( #522 )
2023-08-20 08:07:07 +01:00
Laurent Mazare
03be33eea4
Relax the requirements on CustomOp. ( #486 )
...
* Relax the requirements on CustomOp.
* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
Chengxu Yang
ebcfd96d94
add c++17 flags ( #452 )
2023-08-15 15:29:34 +01:00
Laurent Mazare
531f23b4d0
Rename vec-dot to vec-ops. ( #449 )
...
* Rename vec-dot to vec-ops.
* Also bump the crate version.
* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare
e72ba0b9e7
Add the license files. ( #335 )
2023-08-07 14:11:27 +01:00
Laurent Mazare
4fe8a02f88
Update the repo location. ( #305 )
2023-08-02 11:12:18 +01:00
Laurent Mazare
03a421f714
Add some missing readme files. ( #304 )
2023-08-02 10:57:12 +01:00
Laurent Mazare
d38943aadc
Add version numbers for all the candle crates ( #303 )
...
* Switch to candle-gemm for the time being.
* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare
51e51da896
Rename the candle crate to candle-core ( #301 )
...
* Rename to candle-core.
* More candle-core renaming.
2023-08-02 08:20:22 +01:00
Laurent Mazare
67834119fc
Fix the flash-attention function names. ( #282 )
2023-07-31 10:04:39 +01:00
Laurent Mazare
0ace420e66
Flash attention without padding (varlen). ( #281 )
...
* Expose the seqlen variable for flash-attn without padding.
* Fix the batched call.
* Adapt for the varlen variant.
* No need to set the batch strides when in varlen mode.
* Add a test (disabled at the moment).
* Get the test to work properly.
2023-07-31 09:45:39 +01:00
Laurent Mazare
3eb2bc6d07
Softmax numerical stability. ( #267 )
...
* Softmax numerical stability.
* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
Laurent Mazare
4f92420132
Add some flash attn test ( #253 )
...
* Add some flash-attn test.
* Add the cpu test.
* Fail when the head is not a multiple of 8.
* Polish the flash attention test.
2023-07-26 20:56:00 +01:00
Laurent Mazare
1235aa2536
Use bail rather than wrapping a string where possible. ( #249 )
...
* Use bail rather than wrapping a string where possible.
* Revert the cuda default bit.
2023-07-26 15:42:46 +01:00
Laurent Mazare
f052ba76cb
Lining up the flash attn version with the non-flash one. ( #248 )
...
* Move the flash-attn function in the proper crate.
* Causality tweak.
2023-07-26 15:11:45 +01:00
Laurent Mazare
2ce5f12513
Again set a few extra params in flash-attn. ( #245 )
...
* Again set a few extra params.
* Use the appropriate kernel sizes.
* Add all the kernel sizes.
* Parallel compiling.
* Reduce the amount of parallelism.
* Add the missing kernel.
* Fix a typo.
* Remove bf16 support for now.
2023-07-26 14:16:37 +01:00
Laurent Mazare
fa2b64d678
Proper flash-attn parameters. ( #244 )
...
* Proper flash-attn parameters.
* Set the flash attention parameters.
* Add more validations.
* Setup the o_ flash attn parameters.
* More flash-attn support.
* Set more flash attn parameters.
2023-07-26 10:13:40 +01:00
Laurent Mazare
471855e2ee
Specific cache dir for the flash attn build artifacts. ( #242 )
2023-07-26 08:04:02 +01:00
Laurent Mazare
d9f9c859af
Add flash attention ( #241 )
...
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.
* More flash attn.
* Set up the flash attn parameters.
* Get things to compile locally.
* Move the flash attention files in a different directory.
* Build the static C library with nvcc.
* Add more flash attention.
* Update the build part.
* Better caching.
* Exclude flash attention from the default workspace.
* Put flash-attn behind a feature gate.
* Get the flash attn kernel to run.
* Move the flags to a more appropriate place.
* Enable flash attention in llama.
* Use flash attention in llama.
2023-07-26 07:48:10 +01:00