Commit Graph

22 Commits

Author SHA1 Message Date
Laurent Mazare 618f4e4c78
Add some documentation. (#673)
* Add some documentation.

* Bump the crate version.
2023-08-30 11:54:00 +01:00
Laurent Mazare a3f97c143d
Bump the crate version + update CHANGELOG. (#628) 2023-08-27 18:17:11 +01:00
Laurent Mazare aba1e90797
Add some group parameter to convolutions. (#566)
* Add some group parameter to convolutions.

* Avoid some unnecessary groups checks.

* Move the tensor convolution bits.

* Properh handling of groups.

* Bump the crate version.

* And add a changelog.
2023-08-23 12:58:55 +01:00
Laurent Mazare a8f61e66cc
Bump the crates version to 0.1.2. (#522) 2023-08-20 08:07:07 +01:00
Laurent Mazare 03be33eea4
Relax the requirements on CustomOp. (#486)
* Relax the requirements on CustomOp.

* Simplify the custom-ops when no backward is required.
2023-08-17 11:12:05 +01:00
Chengxu Yang ebcfd96d94
add c++17 flags (#452) 2023-08-15 15:29:34 +01:00
Laurent Mazare 531f23b4d0
Rename vec-dot to vec-ops. (#449)
* Rename vec-dot to vec-ops.

* Also bump the crate version.

* Add a currently empty readme.
2023-08-15 10:48:57 +01:00
Laurent Mazare e72ba0b9e7
Add the license files. (#335) 2023-08-07 14:11:27 +01:00
Laurent Mazare 4fe8a02f88
Update the repo location. (#305) 2023-08-02 11:12:18 +01:00
Laurent Mazare 03a421f714
Add some missing readme files. (#304) 2023-08-02 10:57:12 +01:00
Laurent Mazare d38943aadc
Add version numbers for all the candle crates (#303)
* Switch to candle-gemm for the time being.

* Add the missing versions.
2023-08-02 10:52:13 +01:00
Laurent Mazare 51e51da896
Rename the candle crate to candle-core (#301)
* Rename to candle-core.

* More candle-core renaming.
2023-08-02 08:20:22 +01:00
Laurent Mazare 67834119fc
Fix the flash-attention function names. (#282) 2023-07-31 10:04:39 +01:00
Laurent Mazare 0ace420e66
Flash attention without padding (varlen). (#281)
* Expose the seqlen variable for flash-attn without padding.

* Fix the batched call.

* Adapt for the varlen variant.

* No need to set the batch strides when in varlen mode.

* Add a test (disabled at the moment).

* Get the test to work properly.
2023-07-31 09:45:39 +01:00
Laurent Mazare 3eb2bc6d07
Softmax numerical stability. (#267)
* Softmax numerical stability.

* Fix the flash-attn test.
2023-07-28 13:13:01 +01:00
Laurent Mazare 4f92420132
Add some flash attn test (#253)
* Add some flash-attn test.

* Add the cpu test.

* Fail when the head is not a multiple of 8.

* Polish the flash attention test.
2023-07-26 20:56:00 +01:00
Laurent Mazare 1235aa2536
Use bail rather than wrapping a string where possible. (#249)
* Use bail rather than wrapping a string where possible.

* Revert the cuda default bit.
2023-07-26 15:42:46 +01:00
Laurent Mazare f052ba76cb
Lining up the flash attn version with the non-flash one. (#248)
* Move the flash-attn function in the proper crate.

* Causality tweak.
2023-07-26 15:11:45 +01:00
Laurent Mazare 2ce5f12513
Again set a few extra params in flash-attn. (#245)
* Again set a few extra params.

* Use the appropriate kernel sizes.

* Add all the kernel sizes.

* Parallel compiling.

* Reduce the amount of parallelism.

* Add the missing kernel.

* Fix a typo.

* Remove bf16 support for now.
2023-07-26 14:16:37 +01:00
Laurent Mazare fa2b64d678
Proper flash-attn parameters. (#244)
* Proper flash-attn parameters.

* Set the flash attention parameters.

* Add more validations.

* Setup the o_ flash attn parameters.

* More flash-attn support.

* Set more flash attn parameters.
2023-07-26 10:13:40 +01:00
Laurent Mazare 471855e2ee
Specific cache dir for the flash attn build artifacts. (#242) 2023-07-26 08:04:02 +01:00
Laurent Mazare d9f9c859af
Add flash attention (#241)
* Add some flash-attn kernel, import the code for flash-attn v2 from Dao-AILab.

* More flash attn.

* Set up the flash attn parameters.

* Get things to compile locally.

* Move the flash attention files in a different directory.

* Build the static C library with nvcc.

* Add more flash attention.

* Update the build part.

* Better caching.

* Exclude flash attention from the default workspace.

* Put flash-attn behind a feature gate.

* Get the flash attn kernel to run.

* Move the flags to a more appropriate place.

* Enable flash attention in llama.

* Use flash attention in llama.
2023-07-26 07:48:10 +01:00