ce9fbc3682
* Add a specialized kernel for copy2d. * Move the cat operations. * Avoid transpositions in cat. * Bugfix. * Bugfix for the cuda kernel. * Add a benchmark. * Add more testing. * Test fix. * Faster kernel. * Add the missing kernel. * Tweak the test. * Add a metal kernel. * Fix for the metal kernel. * Get the tests to pass on metal. * Also use this opportunity to fix the metal kernel for ELU. * Add some bf16 kernels. * Clippy fixes. |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
README.md | ||
build.rs |
README.md
candle-kernels
This crate contains CUDA kernels used from candle. Some of these implementations come from the dfdx crate.